My blog has been moved to ariya.ofilabs.com.

Friday, November 24, 2006

Memory efficient DOM (Part 2)

This is the follow-up to the first part of my writing on memory efficient DOM implementation for KOffice. Now my intention is to find out whether real-time compression can help to further reduce the memory consumption.

XML compression is a whole field of research. There are a bunch of compressor available, from XMill, XMLZip, XMLPPM, XGrind, XML-Xpress, Exalt, TREECHOP and many others. Since I'm not a scholar in this field (I already have my own graduate research to take care of), I have chosen a direct and brutal approach: just compress the packed nodes representation and decompress them on-the-fly. While iteration is very common, simple cache buffer for decompressed neighbouring nodes is implemented so there is no need to always decompress every single node. This method easily fulfills the requirements that I set: no need to load the whole XML (because it intercepts SAX events), allows queries and iteration (due to the QDom-like wrapper), and can be conveniently tested with the compression switched off.

There are many algorithms suitable for fast compression and decompression. Byte-aligned Lempel-Ziv is a logical and obvious choice. Again, I have very much zero knowledge about data compression. There are Ross Williams's LZRW1, Herman Vogt's LZV, Markus Oberhumer's LZO, Marc Lehmann's LZF and many others. LZRW1 is well-known although probably covered by patents. LZV has the poorest compression ratio so it goes out of the list. LZO's compression is the best, at the price of a bit more complicated state-machine, but my own implementation (since the reference implementation is GPL only) is not so fast and well debugged. LZF, the successor to LZV, is a good contender and already used (among others) in suspend2 project. On fairly modern machine with large CPU cache and reasonable branch predictor, liblzf (the reference implementation of LZF) is much slower than LZRW1. So what I did is to recreate the compressor. Armed with Valgrind's Cachegrind, my new implementation is finally faster than LZRW1.

What I am really eager to try (but no time for that yet) is Wilson-Kaplan algorithm. At least the WK4x4 and WKdm variants have been implemented for Linux compressed caching project. And since the compression/decompression inside KoXmlReader is actually involving in-memory data, this fast and low-overhead WK algorithm should be a good candidate. There are assorted patent applications for such cache compression, but I wonder whether the algorithm per se is patented.

Now on to the result, starting with the worst test document: Frequencies.ods. Just to remind you, loading and iterating this document (19 MB XML) takes 250 MB if we're using QDom but only 40 MB with KoXmlReader. Here what Valgrind/Massif reported when the compact form of the DOM is compressed and decompressed on the fly. At most, this barely consumes 8 MB. Compared to 250 MB, 40 MB is good enough, but 8 MB is great, isn't it?

Interesting is, during the parsing phase, the heap allocation increases only (almost) linearly.

The full comparison with other documents is shown below. Amazing to see how short the violet bars are.

As for speed, regardless the lightning fast packing and unpacking, there's some penalty. This graph below summarizes the timing required for parsing (light bars) and iterating (dark bars). Compare to what was shown in part 1, I only added the result for KoXmlReader with compression. Small documents are omitted because processing time is short anyway.

Of course, all these graphs must be taken with a grain of salt. But my objective is only to find out whether this all makes sense, not to write a full-brown research paper. So far I'm quite happy with the result.

It can be seen that KoXmlReader is a feasible alternative for QDom (in KOffice) as it does not incur any significant performance penalty yet the memory footprint in the worst case (Frequencies.ods) can only be one sixth (40 MB) compared to that of QDom (250 MB). We also see the additional benefit of using real-time compression. Compared to plain KoXmlReader, although in the worst case it is 1/5 slower (and there is still room for further improvements here), the allocated heap is only one fifth (8 MB vs 40 MB). Or even 1/30 if you bravely compare it against QDom (8 MB vs 250 MB).

I guess we should empirically find out how large the average documents that most people are working with. If these are less than the worst case that I tested, then enabling compression by default is a good thing because the performance penalty won't be noticable. If not, even without compression it is already much more better than using QDom.

Other than that, my aim now is to fully integrate KoXmlReader in KOffice. Hopefully someday bug #59510 can really be closed.

Friday, November 10, 2006

KOffice in LinuxUser

The November edition of LinuxUser features, among others, a review of six alternatives to OpenOffice: Applixware, Siag Office, Softmaker Office, Thinkfree Office, Gnome Office and of course our beloved KOffice.

KOffice is praised for the perfect KDE integration, its resource friendliness, the comprehensive functionalities and the outlook of its future development.

Granted, it was for KOffice 1.5.2 (although we already have 1.6 and even 1.6.1 soon), probably because KOffice 1.6 was not released yet at the time of the review. Nevertheless, that does not stop the good closing words:

Für Privateanwender bietet einzig KOffice eine echte Alternative zu OpenOffice.

Wednesday, November 08, 2006

Memory efficient DOM

A known problem with KOffice applications is the inability to cope with very large data (e.g. bug #59510). This is often related to filters, e.g. when you are importing Excel documents (see bug #85372). Fortunately, as I wrote there, this has been solved already as the filter now can produce large documents without causing the machine to consume every single bytes of RAM.

Still, a very large file can force the application to use hundred MB memory. One of the reason is because we use QDom to hold some DOMs associated with the document. Unfortunately, once the DOM gets very big, QDom would take too much memory space. And once some parts are paged out to swap, performance will degrade significantly. So when people claimed that opening 1 MB file (which is actually the compressed size, not the actual size of the XML document) consumes more than 300 MB memory, it is not exaggerated. After looking at some colorful charts below, you would hardly be astonished.

When the document is loaded, we do not need the DOMs anymore and these can be destroyed. Thus, although loading takes quite some time, once the loading process is finished, the application won't take so much memory anymore.

This topic has been discussed couple of times in the past. Last year, I started to work on a memory efficient implementation of DOM, designed to be used in KOffice. After some attempts, looks like it may after all work. And quite promising indeed.

So, how can this DOM thing be improved?

First, it's likely that we do not need to modify any elements once they are loaded from the file. The way it works is always like this: get the document, load some XML files from the storage (e.g. content.xml, settings.xml, styles.xml, ...), parse them and construct the application-specific objects from those XML files. For example, for a spreadsheet this means that we create all the sheets, rows, columns, cells, and other objects. When the application saves the document, it creates the DOM from scratch and write it to the file. Thus, when opening a file, the corresponding DOMs are not modified at all.

Second, many namespace URIs, attribute names and element values are duplicated. That alone is already a very good reason to cache common strings. For example, in OpenDocument spreadsheet, almost every cell have attributes named table:style-name and office:value-type so we do not need to duplicate them for every single cells.

The replacement for QDom, dubbed as KoXmlReader, is using these tricks. The whole DOM tree is stored in a very compact form (where the overhead per node is minimized and all duplicated strings are stored efficiently), not using all instances of nodes, elements, texts, etc. However, there are KoXmlNode, KoXmlElement, KoXmlDocument classes which correspond to their QDom counterparts. The key is, those DOM nodes are created (from the compact representation) only when we need it and afterwards they can be destroyed again. In short: it's constructed only on-demand. This way, compatibility with QDom classes is preserved (otherwise we need to modify and test thousand lines of code) while the memory requirement can be kept to minimum.

(Side note: yes, we considered using a SAX parser directly. But that means changing a lot of code and is prone to mistakes. And seems nobody is willing to do that. My personal feeling is that the gained performance is not worth the hassle, though but I'm not so sad if I'm wrong here).

How can this help? Say we're dealing with a spreadsheet. When we parse the first sheet we need all the nodes and elements belong to this sheet, but we don't care for nodes and elements associated for the other sheets. So we just load what we need (this loading even happens automagically). After we finish with it, we can move to the next sheet and at the same purge everything from the first sheet (cause we don't bother with it anymore).

I tried to make the API for node, element, text etc very much compatible to QDom. There is even a compiler define so that it just wraps all those QDom classes (to ease transition), which allowed Stefan to change all QDom references with KoXml in one amazing sweep. Unit tests - with more than 1000 checks - provide (hopefully) almost 100% coverage and should help catching bugs or incompatibilities as early as possible (also, I notice that I have one more example of what I wrote some time ago: the whole test suite is longer than code itself).

So how does it perform?

Obviously this is not a scientifically correct benchmark, but just to give some illustrations, here are few spreadsheet documents that I tested (actually I have tried many many documents, these are just some of them). The column Content specifies the size of the content.xml of each document. As you can see, once it's uncompressed, the file can be very big. Other columns are self-explained, I hope.

Document File Size Content Sheets Cells
permfall.ods 18 KB 92 KB 4 271
test.ods 18 KB 144 KB 13 1138
Customer_Solution.ods 90 KB 1478 KB 18 11735
Financial_Report.ods 143 KB 2947 KB 24 8913
bug85372.ods 247 KB 4754 KB 81 347193
Frequencies.ods 627 KB 18952 KB 3 185334

So what happen if we load each of this document, give it to our XML parser, construct the DOM tree and iterate every single cell in the document? We won't load it into KSpread, and we won't create any data from that. We would just treat it like a plain DOM representation of a spreadsheet.

Using Valgrind's Massif, here what I got when I processed one of them using all QDom classes (QDomDocument, QDomElement, etc). As you witness, the memory consumption skyrocketed to around 250 MB although the compressed ODS file is not even 1 MB. In reality (i.e. within KOffice), of course the situation is much worse because KOffice needs also some more megabytes for its own code and data.

And this is how it looks like when I patch QDom so that the strings are cached (get the simple patch, if you're curious). It does not seem to help too much in this case.

However, using KoXmlReader stuff (KoXmlDocument, KoXmlElement, etc), at most we need only around 40 MB (see the vertical axis).

And follows is the complete comparison chart for other documents. Note that I omit Frequencies.ods - which you can already enjoy from the graphs above - because the bars for that file would be very very long compared to the others.

(There are perhaps better ways to instrument memory usage, so please take my lame attempt above with a grain of salt).

So we can see that the family of KoXml classes can be useful. But, what are the catches? For sure, we can't change the DOM tree. That is quite clear. However, as I write in the beginning, this does not matter a lot for KOffice. In other words, you can't just simply substitute your QDom with this KoXmlReader. It is designed and implemented for a very specific use case. It's not even a fully compliant implementation.

Another disadvantage is that iterating over nodes is a bit slower than QDom. The good news: parsing is faster. This is because the whole process is likely memory bound. True, KoXmlReader has some processing overhead while packing and unpacking the nodes (i.e. constructing them on-the-fly). But it definitely also accesses only a fraction of RAM compared to QDom, and we all know how slow memory operation is.

A few runs on P4 1.8 GHz gives the following chart. The light color is for parsing, the darker is for iterating. Looks like KoXmlReader does not really lag behind.

This also makes me rather excited, I wonder whether my next attempt (using additional real-time compression, wait for the Part II of this writing) would still give an acceptable performance in term of speed.

There are still few issues which must be solved before this stuff can be fully utilized in the upcoming KOffice 2.0. I'm working on that. But just for appetizer, here is the memory chart running KSpread and opening one of the test document (Customer_Solution.ods). Using QDom, the peak was about 110 MB before it settled down to around 80 MB when the loading is completed. I have no idea what caused the deep spike in the middle, this must be investigated (Is it really from KSpread? Or possibly Valgrind?).

However, when KoXmlReader stuff is used instead of QDom, the heap allocation (during the parsing and iterating) touched just about 90 MB, as evidenced below:

Please do not assume that 80 MB is what KSpread 2.0 needs to work with such simple document. What I am using is a rather work-in-progress version, so it has lots of extra stuff. KOffice 2.0 is still very far from its release date and there are still lots of on-going work on optimizations.

To be continued...