My blog has been moved to ariya.ofilabs.com.

Thursday, May 25, 2006

Profiling made fun

One of the loudest complaint directed to KSpread is its inability to cope with large file. Worse, if you try to open large Microsoft Excel file with it, it just hangs there for minutes. Often it also consumes all your memory without mercy.

The solution which I introduced for the KOffice 1.5 was to create directly OpenDocument Spreadsheet file through David's excellent KoXmlWriter, thereby bypassing QDom all together. As you might probably know already, QDom is very inefficient to hold a large XML document. Writing XML directly avoids this QDom's dramatic situation.

The filter is thus improved. But still the Excel import filter is slower than it should be. For the fresh-from-oven KOffice 1.5.1, I included so many little improvements that increase the speed of conversion and reduce the memory usage.

You can also test this yourself by running KoConverter, the underrated conversion tool built into each KOffice:

koconverter report.xls report.ods

That command above will convert report.xls (i.e. your Excel file) to OpenDocument Spreadsheet and save the result to into report.ods. Of course, our Excel filter is still very simple (read: it sucks), but the above trick works for simple Excel documents (and faster than launching OpenOffice.org).

Take for example the test file in bug 85372 (folks, that why it's important to file a bug report!), about 1.4 MB in size. It's hopeless to run koconverter from KOffice 1.4 on this file. You can try it but don't blame me if your computer burns.

With KOffice 1.5, the hope is there. You can have that file converted to OpenDocument in a reasonable time. In my test system (a fairly modern machine), it takes about 13 seconds.

But it could be improved indeed. Install KOffice 1.5.1 and try it again. Now it would only take less than 2 seconds. Nice, isn't it?

Graphically, comparing the speed (shorter is better) is like this:

Bug 85372
(1393 KB)
1.5.0
1.5.1

where the blue bar is for KOffice 1.5.1 and red is for KOffice 1.5.0.

Of course I didn't only test with that particular file but with other large documents as well. Take Flows.xls, which I found via random googling and has 6 MB size. Conversion takes 28 seconds with 1.5.0 but only 4 seconds and with 1.5.1:

Flow.xls
(6626 KB)
1.5.0
1.5.1

(The fact that KSpread is still slow when loading the resulting ODS files is another issue which hopefully would be addressed in the upcoming KOffice 2.0).

The key to the boost here is the extensive use of profilers. Right after 1.5 release, I kept myself busy profiling the filter. Armed with Valgrind's Massif and cachegrind and also the excellent Sysprof, it was quite practical to find the bottlenecks though often not so easy to find workarounds. Sure, it's basically a painful and boring job, but if LinuxFormat keeps giving low score to KSpread, then cool shiny Slashdotted hacks won't matter much here, right?

P.S: if your Excel files still crash KSpread, please please file bug report and attach the offending files.

1 comment:

Matt Smith said...

Hi there,

The bad review in Linux Format was actually of a beta version, something which wasn't pointed out in the review but which Boudewijn Rempt pointed out. My response to that (and a link to BR's response) is here.