This week of the sponsored Krita developer was a little shorter. On Monday I traveled back from Holland to Slovakia. I was at the Krita Hackfest 2010. We worked there quite hard so we decided to take a break for me for 2 days. So I started to work on Thursday and finished on Friday.
As I wrote in my previous blog post, I started to work on the smudge brush, because according the bug report it was horribly slow. Although I wrote many brush engines, this one is not mine. But I have many experiences with brush engines so I hoped to use them.
Thanks to the consultations with Cyrille Berger,author of this brush engine, I managed to speed up the smudge. I removed some unnecessary memory device and then I had to write famous custom bitBlt function which takes selections saved in different memory class into account. The performance bottleneck was transformation of our paint device into selection.
We introduced fixed paint device longer time ago which should speed up the composition of the brush masks because it is lightweight memory device, but in the smudge it introduced some workarounds which slowed down the performance. I removed that workarounds but it was complicated, that’s why nobody did it before. I ported some of my paintops to use the fixed paint device because according our benchmarks it is really faster.
Here is the table with the performance times of smudge in our KisStrokeBenchmark, where the brush engine draws the big stroke:
1. 7,519 msec per iteration (total: 7519, iterations: 1)
2. 5,275 msec per iteration (total: 5275, iterations: 1)
3. 1,202 msec per iteration (total: 1202, iterations: 1)
4. 653 msec per iteration (total: 653, iterations: 1)
You can see, that the speedup factor is almost twelve. The table consists of the iterations when optimizing. From initial time to the final optimization.
On Friday I was done so I started to work on a vectorization of the compositing operations. Sounds cool, nah? First I wrote benchmarks for our composite operations and now I will continue to write example code which will use the vectorization in gcc4.x. Here is what I’m going to study and use. The point is to speedup the composition of course.