don't code today what you can't debug tomorrow: July 2011

Thursday, July 28, 2011

open and freedom

Labels: musing, qt

After almost 6 years, 500 blog posts, and over half a million visits, I decide to have a new playground for my (often crazy) experiments. From now on, if you want to follow my ramblings, check ariya.ofilabs.com.

Tuesday, July 26, 2011

tablets and web performance

Labels: gadget, javascript, musing, optimization, technology, webkit

Benchmarks, and the results of running them, are attractive because they eliminate the need to digest an arbitrary complex machinery, reducing it into a meaningful and powerful number. Comparison is thereby made easier, as it is now a matter of playing who has the biggest gun game.

In the areas of web performance, every benchmark becomes more like durian, either you hate it or you love it. No matter how useful (or useless) a benchmark is, there are always folks who defend it and others who despise it. Signal-to-noise ratio changes dramatically when the discussion over some benchmark results is started.

I still reckon that in the years to come, what makes a great experience while browsing the web depends on the performance of (surprise!) DOM access. Common JavaScript frameworks (like jQuery, Prototype, Ext JS, MooTools, YUI, Dojo, and many others) still form the basis for a lot of rich web sites and interactive web applications out there, at least for the time being and till the near future.

While SunSpider and V8 benchmarks are geared towards pure JavaScript performance and Kraken is better suited for future heavyweight applications, Dromaeo becomes a solid candidate for DOM performance analysis. In particular, its set of DOM tests is very valuable because it presents a nice sample of the behavior of framework-based sites. In this context, butter-smooth DOM modification has a bigger impact than just blazing-fast trigonometric computation, at least for gajillions web pages out there.

Since more and more people are accessing the web through mobile platforms these days, I decided to test several popular tablets out there and summarize the result in one graph below (updated):

For the detailed comparisons, check out the complete Dromaeo numbers of all tablets (left-to-right: Galaxy Tab, iPad 2, Playbook, TouchPad). If you find the above result is different that what you test yourself, shout out. I want to be careful not to propagate any discrepancies or misleading results. As usual, take the above beautiful collection of colored bars with a pinch of salt.

Samsung Galaxy Tab 10.1 is powered by Android 3.1 (Honeycomb) Build HMJ37, iPad 2 is using iOS 4.3.3, RIM Playbook's firmware is 1.0.7.2670, which the HP TouchPad has webOS 3.0. The choice of the devices represent a variety of fresh ARM-based tablet operating systems in the market as of this writing.

With Qt coming closer and closer to the become a good companion of the green robot, I wonder how would QtWebKit compete with those numbers. I think we will find out the answer in a couple of months, maybe even sooner.

Wednesday, July 06, 2011

fluid animation with accelerated composition

Labels: coding, effect, graphics, ofilabs, opengl, optimization, qt, qtwebkit, webkit, x2

Those who work on web-based applications on mobile platforms often recall the advice, "Use translate3d to make sure it's hardware accelerated". This advice seems magical at first, and I seldom find anyone who explains (or wants to explain) the actual machinery behind such a practical tip.

For a while, Safari (and Mobile Safari) was the only WebKit-based browser which supports hardware-accelerated CSS animation. Google Chrome caught up, QtWebKit-powered browser (like the one in Nokia N9) also finally supported it. Such a situation often gave the wrong impression that Apple kept the hardware-acceleration code for themselves.

The above two are basically the reasons for this blog post.

In case you miss it (before we dive in further), please read what I wrote before about different WebKit ports (to get the idea of implementation + back-end approach) and tiled backing store (decoupling web page complexity with smooth UX). The GraphicsContext abstraction will be specially useful in this topic. In particular, because animation is tightly related to efficient graphics.

Imagine if you have to move an image (of a unicorn, for example) from one position to another. The pseudo-code for doing it would be:

  for pos = startPosition to endPosition
    draw unicorn at pos

To ensure smooth 60fps, your inner loop has only 16 ms to draw that unicorn image. Usually this is a piece of cake because all the CPU does is sending the pixels of the unicorn image once to the GPU (in the form of texture) and then just refer the texture inside the animation loop. No heavy work is needed on the CPU and GPU sides.

If, however, what you draw is very complicated, e.g. formatted text consisting of different font typefaces and sizes, this gets hairy. The "draw" part can take more than 16 ms and the animation is not butter-smooth anymore. Because your text does not really change during the animation, only the position changes, the usual trick is to cache the text, i.e. draw it onto a buffer and just move around the buffer as needed. Again, the CPU just needs to push the buffer the GPU once:

    prepare a temporary buffer
    draw the text onto the buffer
    for pos = startPosition and endPosition
       set a new transformation matrix for the buffer

As you can imagine, that's exactly what happens when WebKit performs CSS animation. Instead of drawing your div (or whatever you animate) multiple times in different position, it prepares a layer and redirect the drawing there. After that, animation is a simple matter of manipulating the layer, e.g. moving it around. WebKit term for this (useful if you comb the source code) is ~~accelerated composition~~ accelerated compositing.

Side note: Mozilla has the same concept, available since Firefox 4, called Layer.

If you understand immediate vs retain mode rendering, non-composited vs composited is just like that. The idea to treat the render tree more like a scene graph, a stack of layers with different depth value.

Because composition reduces the computation burden (GPU can handle varying transformation matrix efficiently), the animation is smoother. This is not so noticeable if you have a modern machine. In the following video demo (http://youtu.be/KujWTTRkPkM), I have to use my slightly old Windows laptop to demonstrate the frames/second differences:

The excellent falling leaves animation is something you have seen before, back when WebKit support for CSS animation was announced.

Accelerated composition does not magically turn every WebKit ports capable of doing fluid animation. Analog to my previous rounded corner example, composition requires the support from the underlying platform. On Mac OS X port of WebKit, composition is mapped into CoreAnimation (part of CoreGraphics), the official API to have animated user interface. Same goes for iOS WebKit. On Chromium, it is hooked into sandboxed GPU process.

With QtWebKit, composition is achieved via Graphics View framework (read Noam's explanation for details). The previous video you have seen was created with QtWebKit, running without and with composition, i.e. QGraphicsWebView with different AcceleratedCompositingEnabled run-time setting. If you want to check out the code and try it yourself, head to the usual X2 repository and look under webkit/composition. Use spacebar (or mouse click) to switch between composited and non-composited mode. If there is no significant frame rate improvement, increase NUMBER_OF_LEAVES in leaves.js and rebuild. When compositing is active, press D to draw thin yellow border around each layer. Since it's all about Graphics View, this debugging is easy to implement. I just inject a custom BorderEffect, based on QGraphicsEffect (which I did prototype back when I was with Nokia):

Thus, there is nothing like hidden secret with respect to Safari hardware-accelerated CSS support. In fact, Safari is not different than other Mac apps. If you compile WebKit yourself and build an application with it, you would definitely get the animation with hardware acceleration support.

As the bonus, since Mac and iOS WebKit delegate the animation to CoreAnimation (CA), you can use various CA tweaks to debug it. CA_COLOR_OPAQUE=1 will emphasize each layer with red color overlay (as in the demo). While this applies to any CA-based apps (not limited to WebKit or Safari), it's still very useful nevertheless. Chromium's similar feature is --show-composited-layer-border command line option.

How does WebKit determine what to composited? Since the goal is to fully take advantage of the GPU, there are few particular operations which are suitable for such a composition. Among others are transparency (opacity < 1.0) and transformation matrix. Ideally we would just use composition for the entire web page. However, composition implies a higher memory allocation and a quite capable graphics processor. On mobile platforms, these two translate into additional critical factor: power consumption. Thus, one just needs to draw a line somewhere and stick with it. Hence, that's why currently (on iOS) translate3d and scale3d are using composition and their 2-D counterparts are not. Addendum: on the desktop WebKit, all transformed element is accelerated, regardless whether it's 2-D or 3-D.

If you make it this far, here are few final twists.

First of all, just like the tiled backing store approach I explained before, accelerated composition does not force you to use the graphics processor for everything. For efficiency, your layer (backing store) might be mapped to GPU textures. However, you are not obligated to prepare the layer, i.e. drawing onto it, using the GPU. As an example, you can use a software rasterizer to draw to a buffer which will be mapped to OpenGL texture.

In fact, a further variation of this would be not to use the GPU at all. This may come as a surprise to you but Android 2.2 (Froyo) added composition support (see the commit), albeit doing everything with in software (via its Skia graphics engine). The advantage is of course not that great (compared to using OpenGL ES entirely), however the improvement is really obvious. If you have two Android phones (of the same hardware specification), one still running the outdated 2.1 (Eclair) and the other with Froyo, just open the Falling Leaves demo and watch the frame rate difference.

With the non-GPU composition-based CSS animation in Froyo, translate3d and other similar tricks do not speed-up anything significantly. In fact, it may haunt you with bugs. For example, placing form elements in a div could wreck the touch events accuracy, mainly because the hit test procedures forget to take into account that the composited layer has moved. Things which seem to work just fine Eclair may start behaving weird under Froyo and Gingerbread. If that happens to you, check your CSS properties.

Fortunately (or unfortunately, depending on your point of view), Android madness with accelerated composition is getting better with Honeycomb and further upcoming releases. Meanwhile, just take it for granted that your magical translate3d spell has no effect on the green robots.

Last but not least, I'm pretty excited with the lightweight scene graph direction in the upcoming Qt 5. If any, this will become a better match for QtWebKit accelerated composition compared to the current Graphics View solution. This would totally destroy the myth (or misconception) that only native apps can take advantage of OpenGL (ES). Thus, if you decide to use web technologies via QtWebKit (possibly through the hybrid approach), your investment would be future-attractive!

Sunday, July 03, 2011

birds of paradise

Labels: coding, javascript, network, ofilabs, phantomjs, qt, qtwebkit, webkit

PhantomJS, the headless QtWebKit tool, is now listed as one of the Qt Ambassador show cases. There is also a growing list of projects using PhantomJS (let me know if you want to be listed). In fact, PhantomJS running in several Amazon EC2 instances is used as the primary tool in a web security analysis.

Few days ago, right during the summer solstice, I tagged and released PhantomJS 1.2, codenamed Birds of Paradise. There are some exciting changes there that I will briefly outline as follows (for details, see the Release Notes).

Most important change is the fix to the security model. Your PhantomJS script now does not run in the context of the loaded QWebPage anymore (more precisely, the main frame thereof). Rather, we have a new WebPage object that abstracts (surprise) the web page. This forces a major breakage in the API since there is no way to support 1.1 style of API with the same code. Again, check the Release Notes to find out how to migrate your script.

The bonus with the above WebPage object abstraction is a bunch of external callbacks we can set up, most notably onLoadStarted and onLoadFinished, useful to trigger some actions upon page loading. Speaking of JavaScript evaluation, dynamic script tag loading is a quite popular trick to asynchronously load external libraries, e.g. those part of Google Libraries API. Rather than writing your own code, there is now easy-to-use includeJS() function for that specific purpose.

My personal #1 favorite feature is the simple network traffic analysis. I already demonstrated the technique before, i.e. by subclassing QNetworkAccessManager and recording the major network activities (see my previous blog post discussing this in details). The new PhantomJS example script netsniff.js shows the use of this feature: it logs network requests and responses and dumps them in HTTP Archive (HAR) format. You can then use online HAR viewer to see the waterfall diagram, or post-process the data the way you like it. For example, here is what BBC News web site produced:

Since the entire traffic capture can be fully automated, you can have it checked against various rules. For example, you may want to ensure that KDE.org does not get significantly slower (in terms of page loading) every time someone changes the web site design. Perhaps you want to compare the resource loading with gnome.org just to confirm that it loads faster. Gathering the stats of the same site from different geographical locations might also reveal how the web page is perceived by some fans on the other side of the planet.

The use of mobile devices for consuming information is exploding. It would be fun to see PhantomJS leveraged to get the metrics behind the data traffic. I haven't bothered yet to port and test PhantomJS on Qt-based Harmattan-powered phone like the new shiny Nokia N9. Of course, if you a spare one, feel free to Fedex me :)

don't code today what you can't debug tomorrow