show_menu logo_rd

HTML5 Canvas: Performance and Optimization. Part 2: Going Deeper

6

Last time when we were talking about JavaScript optimization, we used some basic optimization techniques to achieve better performance of Flood Fill algorithm on HTML5 Canvas. Today we’re going to go deeper.

We discussed the results internally and came up with several low-level fixes. Here they go:

Minimize the number of created objects

First of all we thought about the enormous number of objects we create during the execution of our algorithm. As you probably remember, we applied a fix named “Temp object creation”, when we tried to minimize the number of performed arithmetic operations. It had a negative effect on performance because of increased memory allocation and garbage collector overhead. So, the less objects you create, the better performance you get. It is not hard to notice that the majority of objects in our code are created here:

stack.push({ x: cur.x + dx[i], y: cur.y + dy[i]});

What if we don’t create new objects here at all? Let’s store in stack individual coordinates instead of creating wrapper objects. Sure, it makes code more complicated and unreadable, but performance is our main goal today. So, we came up with this:

var stack = [];
stack.push(x);
stack.push(y);
// ...
while (stack.length > 0) {
    var curPointY = stack.pop();
    var curPointX = stack.pop();
 
    for (var i = 0; i < 4; i++) {
        var nextPointX = curPointX + dx[i];
        var nextPointY = curPointY + dy[i];
 
        // ...
        
        stack.push(nextPointX);
        stack.push(nextPointY);
    }
}

Results:

Please note, we removed two bad fixes from the previous article. We’ve got nice results in all browsers, but in Safari the results were really amazing: about 45% performance boost. If you recall the bad “Temp object” fix from the previous article, we noticed that Safari was dramatically slower than other browsers after that fix, so such result is a logical consequence of some issues Safari might have with object allocation and/or garbage collection.

Inline functions

Let’s go deeper. Most modern compilers do function inlining automatically or provide you with an ability to mark the function with some kind of inline attribute (think C’s inline keyword or AggressiveInliningAttribute from .NET 4.5). JavaScript doesn’t allow you to do that. Although function inlining might have a dramatic performance effect in case you call the function very often. We call isSameColor about 2 million times and setPixelColor about 700K times. Let’s try to inline them manually:

for (var i = 0; i < 4; i++) {
    var isContiguousPixel = false;
    var nextPointX = curPointX + dx[i];
    var nextPointY = curPointY + dy[i];
 
    // Inline implementation of isSameColor.
    var nextPointOffset = (nextPointY * W + nextPointX) * 4;
    if (imgData[nextPointOffset + 0] == ((hitColor >> 24) & 0xFF)
        && imgData[nextPointOffset + 1] == ((hitColor >> 16) & 0xFF) 
        && imgData[nextPointOffset + 2] == ((hitColor >> 8) & 0xFF)) {
        isContiguousPixel = true;
    }
 
    if (nextPointX < 0 || nextPointY < 0 || nextPointX >= W || nextPointY >= H || !isContiguousPixel) {
        continue;
    }
 
    // Inline implementation of setPixelColor.
    imgData[nextPointOffset + 0] = (newColor >> 24) & 0xFF;
    imgData[nextPointOffset + 1] = (newColor >> 16) & 0xFF;
    imgData[nextPointOffset + 2] = (newColor >>  8) & 0xFF;
 
    stack.push(nextPointX);
    stack.push(nextPointY);
}

Again, it makes our code less readable and understandable, but we really want better performance, so we don’t care about code readability here.

Isn’t it amazing? Absolutely incredible results: we’ve got 70% boost on Firefox, 56% on IE and 36% on Safari. And what about Chrome? Surprisingly it is 9% slower. It seems that Google already implemented automatic function inlining in V8 optimizer and our manual fix is worse than theirs. Another interesting thing: with that fix Firefox is almost two times faster than Chrome, the previous leader.

Optimize CPU cache utilization

There is one thing we never thought before in context of such a high-level language as JavaScript: CPU cache utilization. It is quite an interesting topic and it deserves separate article. As for now, you can read more about it here, for example.

ImageData array has two dimensions that are packed into one dimensional array line by line. So, 3×3 pixel matrix with coordinates (x; y), is basically stored in memory like that:

Let’s look at the following lines in our code:

var dx = [-1, 0, +1, 0];
var dy = [0, -1, 0, +1];

Let’s think about the order we access neighbor pixels if we are in (1;1):

So, we’re going left, then once again left, then right, then again right. According to best practices of cache utilization optimization, it is better to access memory sequentially, because it minimizes the chance to have a cache miss. So, what we need is something like that:

Let’s rewrite our dx, dy arrays:

var dx = [ 0, -1, +1,  0];
var dy = [-1,  0,  0, +1];

Here is what we’ve got:

Well, the only browser reacted significantly was Chrome: we’ve got about 10% better performance with it. Other browsers were up to 3% faster which is in the area of statistical error. Anyway, this is interesting result that means we should pay attention even to such low-level optimizations when we write code on JavaScript – they are still important.

Fixing own bug – reorder if statements

Those of you who read the inlined code carefully, might have already noticed that we actually made a mistake there:

// Inline implementation of isSameColor.
var nextPointOffset = (nextPointY * W + nextPointX) * 4;
if (imgData[nextPointOffset + 0] == ((hitColor >> 24) & 0xFF)
    && imgData[nextPointOffset + 1] == ((hitColor >> 16) & 0xFF) 
    && imgData[nextPointOffset + 2] == ((hitColor >> 8) & 0xFF)) {
    isContiguousPixel = true;
}
 
if (nextPointX < 0 || nextPointY < 0 || nextPointX >= W || nextPointY >= H || !isContiguousPixel) {
    continue;
}

We check the pixel color and only then make sure that we don’t go outside array bounds. In a statically typed language we would get some kind of IndexOutOfBoundsException or Access Violation error here. But in JavaScript arrays are basically hash tables, so nothing prevents you from accessing a non-existing index here. But due to the cost of array element access operation it makes sense to check array bounds before checking colors:

if (nextPointX < 0 || nextPointY < 0 || nextPointX >= W || nextPointY >= H) {
    continue;
}
// Inline implementation of isSameColor.
var nextPointOffset = (nextPointY * W + nextPointX) * 4;
if (imgData[nextPointOffset + 0] == ((hitColor >> 24) & 0xFF)
    && imgData[nextPointOffset + 1] == ((hitColor >> 16) & 0xFF) 
    && imgData[nextPointOffset + 2] == ((hitColor >> 8) & 0xFF))
{
    // Inline implementation of setPixelColor.
    imgData[nextPointOffset + 0] = (newColor >> 24) & 0xFF;
    imgData[nextPointOffset + 1] = (newColor >> 16) & 0xFF;
    imgData[nextPointOffset + 2] = (newColor >>  8) & 0xFF;
 
    stack.push(nextPointX);
    stack.push(nextPointY);
}

The results are surprising:

Most of the browsers results were in the area of statistical error, but Chrome was more than two times faster and got its crown of the fastest browser back in our benchmark! It is hard to tell what the reason of such a dramatic difference is. Maybe other browsers use instruction reordering and already applied that optimization by themselves, but it looks strange Chrome doesn’t do it.

There also may be some hidden optimization heuristics in Chrome that help it understand that stack is used as simple array, not a hashtable, and it makes some significant optimization based on that fact.

Conclusions

  • Low-level optimizations matter even if you write your code in high-level language such as JavaScript. The code is still executed on same hardware as if you wrote it in C++.
  • Each JavaScript engine is different. You can have significant performance boost in one browser, but at the same time your fix may make your code slower in another browser. Test your code in all browsers your application must work with.
  • Keep the balance between code readability and performance considerations. Sometimes it makes sense to inline function even if it makes your code less readable. But always make sure that it brings you the desired value: it doesn’t make sense to sacrifice code readability for 2 ms performance boost for code that is already fast enough.
  • Think about object allocation, memory usage and cache utilization, especially if you work with memory intensive algorithms such as Flood Fill.

You can find all the results with exact numbers on Google Spreadsheets: https://docs.google.com/open?id=0B1Umejl6sE1raW9iRkpDSXNyckU

You can check our demo code on GitHub: https://github.com/eleks/canvasPaint

You can play with the app deployed on S3: https://s3.amazonaws.com/rnd-demo/canvasPaint/index.html

Thanks to Yuriy Guts for proposed low-level fixes.

Stay tuned!

Victor Haydin

Victor Haydin is the Head of Products and Services at ELEKS where he leads the company towards technical innovations. He likes experimenting with cutting-edge technology and creating amusing projects. Formerly an avid participant of programming contests like ACM ICPC, TopCoder and Google AI Challenge, now Victor enjoys travelling and reading. Lately, Victor’s main project was parenting. Due to his inborn graphomania, Victor is the most active ELEKS Labs author, writing about HPC, cloud, wearables and other cutting-edge technology topics.

Yuriy Guts

Yuriy Guts is an R&D engineer, .NET stream lead and simply an übergeek at ELEKS. Being a polyglot programmer, he’s been dealing with code for the biggest part of his life. Some witnesses even claim his first words were “Hello world!\n”. Yuriy enjoys architecting complex cloud solutions for various companies across the globe. When not at work, he can often be seen (and heard by his neighbors) jamming on his guitars.

tags

Comments: 6

  • Vyacheslav Egorov

    V8 does not like out-of-bounds array accesses. If you code constantly accesses arrays out-of-bounds then this function will end up running in the non-optimized code.

    The reason for that is mostly due to complicated semantics of JavaScript: out of bounds access has to traverse prototype chain etc.

    The reodering of conditions that you did actually changes semantics of the code so I doubt any engine actually does it.

  • julien

    The array pop,/push approach well probably make the browser hang in some cases, i’d use slice once I’ve reached the maximum number of elements I want in the array

  • Nyet

    Could maybe hitcolor and newcolor 3 calcs be precalculated outside the loop?

  • Nyet

    Nextpointoffset can be autoincremented instead of calculated inside the loop