I finished all my exams on time, so last Monday I could starting working full-time on Krita. I’ve now been at it for a full week! How does it go?
First week according the plan was aimed at measuring the speed of Krita. We talked about our bottlenecks on IRC regulary. We also talked about them in Oslo. But we didn’t have any numbers. Sven Langkamp did some benchmarks using QTime on iterators, there were some performence tests scattered in unit tests etc. Boudewijn has decided to use new benchmark features from Qt4.6. So I wrote 10 benchmark classes where we benchmarked stuff like internal data memory management for images – our tile engine. Some access classes which access pixels for use called iterators. They allow to iterate over pixels in various ways. Vertically, horizontally , in small rectangles, randomly etc.
Another important thing is compositing. That is the work of class called KisPainter (something like QPainter in Qt but with different complicated features not available in QPainter). We benchmarked the speed of bitBlt operation with two types of memory storage. First is KisPaintDevice and second one is called KisFixedPaintDevice. The later one is lightweight version of the first one. It is similar to QPaintDevice in Qt but again with more complicated features.
Øyvind Kolås a.k.a. pippin, Gegl developer is around. Pippin shared his knowledge about benchmarking with us on irc (btw come and visit as at #krita on irc.freenode.net) We decided to make tests for blur and brightness/contrast filter. The first one is convolution filter, the second one is here because we wanted to be able to compare to GIMP.
We also made a benchmark for the image projection. The projection benchmark loads an image in Krita’s native format, computes the whole image constructed from various types of layers like group, filter, adjustment layer etc. and in the end we again save the file into native Krita format. Our focus through this plan is to speed up painting. So we can’t avoid stroke benchmarks.
Thanks go to Sven Langkamp for his work on presets saving/loading. Using paintop presets, we can use the benchmark code I did, for any paintop. We benchmarked our autobrush default paintop. It is most used paintop for digital painters. I’m very happy about this benchmark as I can test my other paintops easily, all I need to do is create a preset for any paintop, save it in Krita and run the benchmark with the preset.
There results of the benchmars are on our wiki.
So it looks: The data manager, which is responsible undo/redo and basically for storing and retrieving data is fast. It allows us to read/write data at very high speeds. From 1333.3 Mb/s to 1628.4 Mb/s, according to the benchmark results. We benchmarked on 4096×4094 RGBA image (64 Mb) which was read/wrote 100 times. For comparison memcpy for two buffers of the same size as the image we used is almost the same speed. There is benchmark code for memcpy in KisDatamanagerBenchmark, you can try it yourself if you want
Horizontal and vertical iterators are 11 times slower then Rectangular iterator. The reason is that there is no caching. From valgrind logs we see that fetching and switching tiles is very slow, so we need to implement caching there. Every 64 pixels a new tile is fetched and switched to. Why? Our tile is size of 64 pixels. This slow downs the iteration. We will cache the both iterators to avoid switching and fetching. We will cache tiles, we don’t do that now. The rectangular iterator is quite fast, does not offer so many opportunities to improve. The random iterator on the other hand is the slowest one. Again fetching and switching to tiles is expensive. Some caching strategy would be handy for moving around the image. But the use case for using the random accessor is different so the cache strategy should be somehow adaptive. The random accessor is 13-times slower than rectangular one.
The compositing operation also known as BitBlt of KisPainter is used very often. There is room for improvements, because currently it uses a slow iterator – random iterator. The speed is very decent, but we will try to make a lot faster.
Filters are very slow compared to GIMP. At least according the numbers pippin provided. But first we need to improve underlying issues like the aforementioned iterators to improve their performance. We blur the image with speed of 0.8 Mb/s. Here we need to optimize the convolution painter. Also the speed could be better when we optimize the horizontal iterator.
Strokes are slow because of the recomputation of the brush mask is needed every dab – every time brush touches the canvas in certain point. That’s slow. You can see from the valgrind log, that the math function atan2 is slow. We have to cache the result to avoid this.
The conclusion at the end of the first week is that we need to cache iterators and cache the brush mask. Some parts of Krita have very nice speed like the tile engine. Then we have slow parts like iterators. I think we can gather a good performance boost with our plan.