I’m excited, and I hope you’ll be too.
David Edmundson and I have been working hard the last weeks. It’s not that we don’t usually work hard, but this time I’m really excited about it.
A bit of context: in Plasma an important part of the system drawing is painting frames (others are icons, images and the like). Those are in general the elements that are specified in the Plasma themes. These will be buttons, dialog backgrounds, line edit decorations, etc.
So far, to paint those we were assembling the full image in the CPU and then sending it over to OpenGL as a full texture, then we would do the composing of all the different frames, according to the information provided by QtQuick, through the Qt Scene Graph. There are 2 main problems in the current approach.
- We were maintaining huge textures in memory. Every frame was completely stored in memory and gpu memory. Which means that the bigger the dialogs are, the more memory we consume, even though the texture is flat.
- Every time we resize the frame, we have to re-assemble the frame in CPU memory and upload it again.
First: The 9-patch approach
First we made it possible to have the frames to be rendered by each different parts and assembled by the GPU. This wasn’t possible, because Plasma themes are quite complex, so now we have 2 different paths. If the theme element can take advantage of the optimization, it will use the new code, otherwise it will stay working beautifully on the former, thorough implementation [1].
Therefore, instead of rendering all the frame now we’ll be uploading 9 textures to the GPU, and let it either tile or stretch depending on the settings in the theme. This way:
- we are uploading 9 tiny textures rather than a big texture.
- when the frame is resized, we tell the nodes to resize and the GPU does the job [2].
Second: Caching the textures
Now everything was in place, we’d have many times the small 9 elements but we kept uploading them to the GPU over and over. It’s little textures, but it’s still better if we get to re-use what we already have. To do so, we’ve placed a little hash table that keeps track of the already created textures to re-use them. This way, we get to tell the Qt Scene Graph to use a texture that has already been uploaded rather than a new one. We’ve run some tests, here’s the result:
- In PlasmaShell we get 342 miss and 126 hits, so roughly 25% of bandwidth and memory improvement
- In KRunner we get 108 miss and 369 hits, so roughly 350% improvement on memory and bandwidth improvement.
Future, further work
Sadly enough, raw memory usage is still quite high, when running plasma shell on massif, we are still reported as most of the memory usage being in the GPU graphics card (or rather i965_dri.so), so we’ll have to dig it [3]. We’ve found some ways to improve this, for example by enforcing OpenGLES 2, but this requires Qt 5.4 which is due October. I implemented it nevertheless, and it works fine.
Being more precise, a big offender is using a wallpaper image. We’ve looked into it, the code looks fine, but then it makes a big difference, so big that I still don’t understand how it can be. A good suggestion if you’re testing Plasma 5 on a system low on memory, is to run it with the plain color wallpaper. We can save up to 30% of memory consumption, no kidding (it actually depends on who you ask, either massif, htop or ksysguard; but they all agree it’s a big deal). We’ve investigated a bit and found ways to improve the situation there, but if you are interested, feel free to join!
Finally, another problem with regards to memory consumption is QML. We make heavy usage of it and it shows memory-wise. We should see if we can adopt any optimization to stream-line our usage, but admittedly it’s much better than one would have expected.
Testing
If you want to give it a try, you can already find most of this in master, and it will be available from the next KDE Frameworks 5.1 release which will be available by the second week of August.
Hope you liked it, it was a great exercise to investigate all this! I learned a lot and gained quite some respect for the Qt Scene Graph and QML development teams. Keep rocking!
[1] More precisely, at the day, when there’s no hint-compose-over-border or mask-*-overlay elements
[2] an exception for it being (hint-stretch-borders and hint-tile-center hints, where we’ll have to re-render on resize it).
[3] David, Vishesh and I we all have Intel drivers, but I guess it’s a good card to test-case on, given how mainstream it is, currently.