Taking advantage of OpenGL from Plasma

4 August 2014 / apol / 12 Comments

I’m excited, and I hope you’ll be too.

David Edmundson and I have been working hard the last weeks. It’s not that we don’t usually work hard, but this time I’m really excited about it.

A bit of context: in Plasma an important part of the system drawing is painting frames (others are icons, images and the like). Those are in general the elements that are specified in the Plasma themes. These will be buttons, dialog backgrounds, line edit decorations, etc.

So far, to paint those we were assembling the full image in the CPU and then sending it over to OpenGL as a full texture, then we would do the composing of all the different frames, according to the information provided by QtQuick, through the Qt Scene Graph. There are 2 main problems in the current approach.

We were maintaining huge textures in memory. Every frame was completely stored in memory and gpu memory. Which means that the bigger the dialogs are, the more memory we consume, even though the texture is flat.
Every time we resize the frame, we have to re-assemble the frame in CPU memory and upload it again.

First: The 9-patch approach

First we made it possible to have the frames to be rendered by each different parts and assembled by the GPU. This wasn’t possible, because Plasma themes are quite complex, so now we have 2 different paths. If the theme element can take advantage of the optimization, it will use the new code, otherwise it will stay working beautifully on the former, thorough implementation [1].

Therefore, instead of rendering all the frame now we’ll be uploading 9 textures to the GPU, and let it either tile or stretch depending on the settings in the theme. This way:

we are uploading 9 tiny textures rather than a big texture.
when the frame is resized, we tell the nodes to resize and the GPU does the job [2].

Second: Caching the textures

Now everything was in place, we’d have many times the small 9 elements but we kept uploading them to the GPU over and over. It’s little textures, but it’s still better if we get to re-use what we already have. To do so, we’ve placed a little hash table that keeps track of the already created textures to re-use them. This way, we get to tell the Qt Scene Graph to use a texture that has already been uploaded rather than a new one. We’ve run some tests, here’s the result:

In PlasmaShell we get 342 miss and 126 hits, so roughly 25% of bandwidth and memory improvement
In KRunner we get 108 miss and 369 hits, so roughly 350% improvement on memory and bandwidth improvement.

Future, further work

Sadly enough, raw memory usage is still quite high, when running plasma shell on massif, we are still reported as most of the memory usage being in the GPU graphics card (or rather i965_dri.so), so we’ll have to dig it [3]. We’ve found some ways to improve this, for example by enforcing OpenGLES 2, but this requires Qt 5.4 which is due October. I implemented it nevertheless, and it works fine.

Being more precise, a big offender is using a wallpaper image. We’ve looked into it, the code looks fine, but then it makes a big difference, so big that I still don’t understand how it can be. A good suggestion if you’re testing Plasma 5 on a system low on memory, is to run it with the plain color wallpaper. We can save up to 30% of memory consumption, no kidding (it actually depends on who you ask, either massif, htop or ksysguard; but they all agree it’s a big deal). We’ve investigated a bit and found ways to improve the situation there, but if you are interested, feel free to join!

Finally, another problem with regards to memory consumption is QML. We make heavy usage of it and it shows memory-wise. We should see if we can adopt any optimization to stream-line our usage, but admittedly it’s much better than one would have expected.

Testing

If you want to give it a try, you can already find most of this in master, and it will be available from the next KDE Frameworks 5.1 release which will be available by the second week of August.

Hope you liked it, it was a great exercise to investigate all this! I learned a lot and gained quite some respect for the Qt Scene Graph and QML development teams. Keep rocking!

[1] More precisely, at the day, when there’s no hint-compose-over-border or mask-*-overlay elements

[2] an exception for it being (hint-stretch-borders and hint-tile-center hints, where we’ll have to re-render on resize it).

[3] David, Vishesh and I we all have Intel drivers, but I guess it’s a good card to test-case on, given how mainstream it is, currently.

bluesystems, KDE, me, plasma, Software Libre

5 Comments

Pascal
4 August 2014 at 10:59 am

How does plasma desktop 5 (with these changes) memory usage match up against plasma 1?
More/less memory usage? (possibly with some ca numbers, 50% 200%….)
Milian Wolff
4 August 2014 at 11:34 am

Hey guys,

regarding the meomory impact of a high-definition wallpaper: This is (sadly) to be expected. PNG/JPEG are quite good at compressing data, but when you load such images into CPU/GPU ram you decode that data and that can easily lead to multiple megs of memory consumption… I mean it’s going to be at least 32bit * 1920 x 1080 ~ 8.3MB. A colleague of mine says that you could try the (patented?) vendor-specific OpenGL extensions for compressed textures. Or wait for unified memory models to be useable.

Bye
apol (Post author)
4 August 2014 at 3:18 pm

#1 it’s still about 2x the memory usage, it largely depends on what tool you ask though.

#2 I know milian, we’ve made those numbers as well. We’ve been pondering just not saving the QImage on the user side, the texture is uploaded to the GPU, so it makes little sense to keep a copy of it in memory, it doesn’t get requested again that often and in those cases we could probably just pull it from the filesystem again.
markg85
4 August 2014 at 11:03 pm

Hi,

You mention some improvements like: “In PlasmaShell we get 342 miss and 126 hits, so roughly 25% of bandwidth and memory improvement” but you don’t mention what it used to be. On a second note. Having 342 misses and 126 hits (lets say a 3 to 1 ratio) is still quite bad. Have you considered looking at the Google TextureManager for QML that i linked to on plasma-devel [1, 2]? I don’t know if _that_ is what you need, but you do need a bit more room for textures that are dropped out of the internal qt cache as soon as you don’t use them anymore. They should stay in memory for a few more seconds (or even minutes) to be re-used when you need to same texture again.

Regarding a wallpaper and memory size. I think that’s not an issue. You can win much more by optimizing QML, but i fear further optimizations in that area quickly become very complicated in the QtQuick core itself. For the wallpaper we have the same issue in KDE 4.x or any pre plasma 5 thing, only there PC memory is used instead of GPU memory. To have a wallpaper you just need to have it decompressed in memory unless you have a opengl (es) extension that can display a compressed image. Perhaps this helps: http://www.ciaranmccormack.com/texture-compression-for-opengl-es-2-0/

But if you run intel, you should have OpenGL ES 3.0 ETC2 texture compression: http://androidworks-kea.blogspot.nl/2013/08/developers-notes-ii-etc2-texture.html

You could also have “GL_OES_texture_compression_astc” which is the next generation royalty free texture compression. I’m not quite sure if that extension is “part of” OpenGL ES 3 or just an extension for it..

Good luck 🙂

[1] https://github.com/google/VoltAir/blob/develop/VoltAir/Engine/graphics/TextureManager.cpp
[2] https://github.com/google/VoltAir/blob/develop/VoltAir/Engine/graphics/TextureManager.h
Fabian
8 August 2014 at 12:11 am

> I mean it’s going to be at least 32bit * 1920 x 1080 ~ 8.3MB
That plus the mipmaps, which should be disabled for background images. I haven’t looked at the source code yet, but if mipmapping is enabled, it makes sense to disable it.