Colors

libavg has always supported specifying colors in html-like fashion: You pass a string of six hex digits, two each for the red, green and blue components of the color. That’s nice if you’re copying the code from Gimp or Photoshop, but it was quite limiting if you wanted to actually calculate colors or maybe interpolate between two colors.

So… there’s a shiny new Color class that can be constructed from a tuple and passed whereever color strings were expected before (of course, strings still work). Colors can be mixed and used in animations. Last but not least, I did some research on smart ways to mix colors. It turns out that a simple weighed average of the RGB components doesn’t look so good – for instance, mixing 50% red with 50% yellow results gives you gray. So libavg now mixes in CIE L’ch color space, which more-or-less preserves the saturation and looks pretty good.

As a result, you can now do this:

startColor = avg.Color("0000FF")
endColor = avg.Color("FFFF00")
for i in xrange(11):
    color = avg.Color.mix(startColor, endColor, 1-(i/10.))
    avg.RectNode(pos=(i*20,0), size=(20,20), fillcolor=color, 
            fillopacity=1.0, parent=rootNode)

to produce:

colors

More infos on color mixing are in this blog post.

Raspberry Pi Update

libavg has supported the Raspberry Pi for a while now. In the last few weeks, I set up a cross-compile toolchain: You compile libavg for the Raspberry Pi on a separate Linux machine. This means compiling is done in a few minutes (as opposed to an hour or more if you compile directly) – here are build instructions. Also, after a bugfix, we have full libavg functionality on the Pi. Video decoding, sadly, is still done in software, since the first two people that have tried implementing it have given up – I’ll see what I can do on that front.

Speed

libavg just got another major optimization.

I implemented an image registry and cache for libavg. ImageNodes that reference the same image file now reference the same bitmap in CPU memory and the same texture in GPU memory. This is completely hidden from the app developer, who just specifies the file location for all instances. The obvious benefit is that this saves a lot of memory if an application re-uses lots of bitmaps. The less obvious benefit is that it speeds things up as well: avg_checkspeed, which tests with thousands of identical ImageNodes, can now handle around 15000 Nodes at 60 FPS on my old i7 (still Core i7 920 Bloomfield, 2.66 MHz, NVidia GF260 like in the old benchmarks). This is twice as many as before.

A Tutorial

Finally, libavg has a tutorial. Over the last few weekends, I’ve put together a short but thorough tutorial on libavg. It covers the important concepts – app structure, scene graph, update loop, event handling and deriving your own node classes – and it does this in the context of a short and very nice 500-line program that exercises all of these concepts.

The firebirds sample that’s been included in libavg for a while is the basis for this tutorial. In fact, when Scotty wrote the sample two or three years ago, I promised him that I’d write a tutorial based on it – Scotty, thanks for the sample and sorry for taking so long!

Multithreading, Realtime Graphics and Process Affinity Masks

In libavg, we try to make it as easy as possible to have a consistent framerate that matches the screen refresh rate. For almost all current systems (and ignoring new developments such as NVidia G-Sync), that means delivering a new frame every 16.67 milliseconds.

To make this possible, libavg is designed as a multi-threaded system and long-running tasks are moved to separate threads. So, for instance, the BitmapManager class loads image files in one or more background threads (the number of background threads is configurable using BitmapManager.setNumThreads()), the VideoWriter uses a background thread to encode and write video files, and all videos are decoded in background threads as well. Besides enabling quick screen updates in the main thread, this also allows libavg-based programs to utilize more than one core in a multi-core computer. The threads are distributed among the cores by the operating system according to the load, and in general, this works pretty well.

However, the operating system has no way of knowing that one of the libavg threads is special and should be able to churn out frames at 60 fps. So, if the background threads cause too much load, some of them will run on the same core that the main thread is running on, and framerate can become irregular.

Happily, there’s a cure for the issue: We lock the screen update thread to one specific core and forbid all other threads from using this core using thread affinity functions.

Version 1.8.2 Released

Release 1.8.1 broke audio on some linux machines and all macs, so we’ve released version 1.8.2 that just fixes this bug.

Switch to github & Version 1.8.1 Released

We’ve moved the libavg sourcecode from our own svn repository to github. Here’s the project in it’s full glory: https://github.com/libavg/libavg. Obviously, that gives us the full power of github, including much better issue and branch/merge tracking, easy forking, etc.. If you look at the libavg network graph, you can see that we’re busy using all these new capabilities :-) .

Most of the work needed – including moving all branches, issue tracking, adapting the continuous build, fixing web links and instructions etc. – was done by Richy, with help by OXullo and Benedikt Seidl. Thanks!

Also, we just released version 1.8.1, a bugfix release. Of course, we happily used github support for this.

Supporting Twelve Screens at Once

Our latest and biggest (not to mention coolest) toy at the Interactive Media Lab Dresden is a ten square-meter interactive wall that’s fully touch-sensitive and supports markers and pens as well. It consists of twelve Full HD monitors hooked up to two Radeon 7970 graphics cards in a single dual-Xeon workstation. Since it’s driven by a single workstation, we can drive the complete wall with a single application, which is really cool and sets it apart from most similar setups. However, the dual-graphics-card setup causes issues: Under Linux, we have two separate desktops, and under Windows, applications that span the graphics card boundary are extremely slow.

To get full-screen rendering at interactive speeds, you basically have to open two borderless windows – each spanning 6 screens and pinned to one of the GPUs. Then you render the same scene with different viewports in each of the windows. That means that all context-specific data – textures, vertex buffers, shaders, framebuffer objects, and even caching of shader parameters – needs to be replicated across both contexts. Also, we can’t switch contexts too often, because that would make things slow.

libavg renders in two passes: The first (implemented in the Node.preRender() functions) prepares textures and vertex data. It also renders FX nodes. The second pass (implemented in Node.render()) actually sends render commands to the graphics card. The multi-context code changes a few things: While preRender() is still executed only once, render() is executed once per GPU. Uploads of data as well as effects that need to be rendered are scheduled in preRender and actually executed at the beginning of the render. In total, refactoring everything accordingly was (obviously) a lot of work that impacts lots of code all over the graphics engine, but the result is good rendering performance with 24 megapixels of resolution.

The code is still on a branch (The svn repository is at https://www.libavg.de/svn/branches/experiments/libavg_multicontext/), but it passes all tests, and I’ll merge it to trunk after we’ve used it a bit.

Version 1.8 Released

I’ve just released version 1.8 of libavg. You can download it at the usual place.

This means that finally, all of the cool features that have been in the development version for a while are available in an easy-to-install package. The release includes a skinnable widget library, a unified event handling framework (see Cleaning up Messaging), and a much-improved App class by OXullo. Scotty added a very nice sample program called firebirds that showcases libavg development in a compact form, and Richy implemented a new logging framework. Rendering is much faster (see Speeding up Rendering), and I completely rewrote the video decoding subsystem (see Video Decoding using libav and ffmpeg). Lots of other things have been improved as well – see the NEWS file for details.

Lots of thanks to OXullo, Richy and Scotty for testing the release!

Video Decoding using libav and ffmpeg

I spent the last month completely taking the libavg video decoding module apart and putting it together again. I’m finally convinced that the code is well-designed and readable – and it’s fast. It turns out that getting good video decoding is not as easy as it sounds, so I’ve written up a complete description of the insides for anyone that’s interested: VideoDecoding.

The weird thing is that from the outside, it looks like a solved problem, so every time I start telling someone about this, I get the same reaction. There’s libraries for that, right? You just plug in libav or ffmpeg or gstreamer or proprietary things like QuickTime or DirectShow. All these libraries have existed for years, so they’re stable and easy to use, right?

Well, yes and no. If you don’t need advanced features, high-level libraries like gstreamer might do what you want. But we want frame-accurate seeking and a low-latency mode, as well as color space conversion using shaders. Opening and closing video files shouldn’t cause any interface stutters, and so on. Also, libavg can’t work with proprietary libs – we need something that works cross-plattform. That leaves libav/ffmpeg, and this library exposes a pretty low-level interface. It does support every codec but the kitchen sink (pardon the wording) and gives you control over every knob that all of these codecs have to tune things. That’s really great, because you wanted control, right? Anyway, you can get everything done with libav/ffmpeg, but suddenly things get complicated. For starters, you’re suddenly juggling five threads: demuxer, video decoder, audio decoder, display and audio mixer. libav/ffmpeg leaves all the threading to the user, so you’re dealing with a pretty complicated real-time system where lots of things happen at the same time. Dranger’s tutorial helps, but it’s dated.

To make things worse, the interface of libav/ffmpeg changes with minor revision numbers, so to support a few years of operating systems, you find yourself adding a generous amount of #ifdefs to the code. I couldn’t find documentation that describes which changes happened in which minor revision, so you need to guess appropriate version numbers for the #ifdefs based on tests with multiple systems. Oh, and there’s actually several constituent libraries that each have their own version number. Of course, you need to query the correct one. All of that takes time; the resulting code is hard to read and test. In addition, since ffmpeg forked and the developers aren’t on speaking terms (see this and this if you really want to know more), you need to test with libav (the fork) and ffmpeg (the original) if you want maximum compatibility.

All of this is really a pity, because I think the libav/ffmpeg developers are insanely smart guys and the library does do a really admirable job of de- and encoding everything you can throw at it. Also, if I’m honest, most of the time spent was figuring out how to organize the different threads well – and that’s something I really can’t blame libav/ffmpeg for.

Anyway, we’re now ready to add Raspberry Pi (read: OpenMAX IL) and VA-API hardware decoding, seamless audio loops and other cool things to libavg.

Raspberry Pi Support

I’m sure most of you have heard of the Raspberry Pi, a $25 ARM computer that runs Linux. We’ve spent quite a bit of time in the last weeks getting libavg to run on this machine, and I’m happy to say that we have a working beta. We render to a hardware-accelerated OpenGL ES surface and almost all tests succeed. Besides full image, text and software video support, that includes all compositing and even offscreen rendering and general support for shader-based FX. We have brief setup instructions at https://www.libavg.de/site/projects/libavg/wiki/RPI. Update: The setup instructions have been updated for cross-compiling (much faster!) and moved to https://www.libavg.de/site/projects/libavg/wiki/RaspberryPISourceInstall.

Most of the work was getting libavg to work with OpenGL ES. We now decide whether to use desktop or mobile OpenGL depending on a configure switch, an avgrc entry and the hardware capabilities. Along the way, we implemented mobile context support under Linux for NVidia and Intel graphics systems, so we can now test most things without actually running (and compiling!) things on the Raspberry. Speaking of which – compiling for the Raspberry takes a long time. Compiling on it is impossible because there just isn’t enough memory. We currently chroot into a Raspberry file system and compile there (see the notes linked above).

A lot of things are already implemented the way they should be for a mobile system. That means that, for example, bitmaps are loaded (and generated, and read back from texture memory…) in either RGB or BGR pixel formats depending on the flavor of OpenGL used and the vertex arrays are smaller now so we save bandwidth. Still, there’s a lot of optimization to do. Our next step is getting things stable and fast. We want hardware video decoding, compressed textures – and in general, we’ll be profiling to find spots that take more time than they should.

Cleaning up Messaging

Over time, libavg has accumulated support for a number of message callbacks. These are:

In addition, we’re currently adding some widget classes, and that adds more callbacks for button presses, list scrolling, etc.

While this allows you to get a lot of things done, it’s not consistent and hence not very easy to learn. The methods used to register for messages aren’t standardized. They have inconsistent names and varying parameters. Some allow you to register several callbacks for an event, some don’t. For an example, compare Node.connectEventHandler() to the gesture interface using constructor parameters. In addition, the implementation is just as problematic. We have multiple callback implementations in C++ and Python, which results in error-prone, high-maintanance code.

Publishers

When work on the new widget classes promised to make things even more convulted, we decided to do something about the situation and implement a unified, consistent messaging system. The result is a publisher-subscriber system:

  • Publishers register MessageIDs.
  • Anyone can subscribe to these MessageIDs by registering callbacks. Several subscribers are possible in all cases.
  • When an event occurs, all registered callbacks are invoked.

We spent quite a bit of time to make a lot of things “just work”. The subscription interface is very simple. As an example, this is how you register for a mouse or touch down event:

node.subscribe(node.CURSOR_DOWN, self.onDown)

Any Python callable can be registered as a callback, including standalone functions, class methods, lambdas and even class constructors. In most cases, you don’t have to deregister messages to clean up either. Subscriptions are based on weak references wherever possible, so when the object they refer to disappears, the subscription will just disappear as well.

You can write your own publishers can be written in Python or C++ by simply deriving from the Publisher class. In Python, you need two lines of code to register a message:

class Button(avg.DivNode):
    CLICKED = avg.Publisher.genMessageID()
    [...]
    def __init__(...):
        self.publish(self.CLICKED)
    [...]

and this line invokes all registered subscribers:

self.notifySubscribers(self.CLICKED, [])

The second parameter to notifySubscribers is a list of parameters to pass to the subscribers.

Transitioning

Transitioning old programs to the new interface is not very hard and involves replacing old calls to Node.connectEventHandler(), VideoNode.setEOFCallback(), Contact.connectListener() and so on with invocations of subscribe(). We’ll keep the old interfaces around for a while, but they’ll probably be removed when we release ver 2.0.

The End of Touch Jitter

On lots of multitouch devices, input suffers from jitter: The actual touch location is reported imprecisely and changes from frame to frame. This has obvious negative effects, since it’s much harder to hit a target this way. For years, people have been telling me that a lowpass filter would help. In its simplest form, a lowpass filter averages together the location values from the last few frames. This removes most of the jitter – because the jitter is random, there’s a good chance that the errors in successive frames cancel each other out. On the other hand, it adds latency because the software is not using the latest data. This tradeoff didn’t seem like a good one to me, so I didn’t add a jitter filter to libavg.

However, at this year’s CHI conference, Géry Casiez and coauthors published a paper on a 1€ Filter. This filter is based on an extremely simple observation: Precise positions are only important when the user is moving his finger slowly, while latency is important at fast speeds. So, the solution to the dilemma I described in the first paragraph is to build a filter that adjusts its latency depending on speed. Their filter is extremly simple to implement, and the results are really nice.

libavg can now process the touch input positions using this filter. The filter parameters are configurable in avgrc, and there’s a configuration utility (avg_jitterfilter.py) that helps in finding correct filter values. The complete implementation is in the libavg_uilib branch – I’ll merge it to trunk in the next few weeks.

Talk On SimMed

In April, I gave a talk on SimMed, our multitouch medical education project (using libavg). The people at the Saarbrücken Centre for e-Learning Technology have put a video of the talk online. I believe it shows very well what we’re trying to do (sorry it’s in German):

Intel Graphics

After the rendering optimization I desribed in my last post, tests with Intel Atom chipset graphics (N10 chipset) uncovered a problem. The system was running in software rendering mode, which slows things down by a factor of about a thousand. It turns out that more than two texture accesses in a shader are too much for the hardware. Additionally, lots of Intel chips render all vertex shaders in software, and that also causes a tenfold slowdown if the libavg 3-line vertex shader is in use.

So now, there’s a second rendering path with minimal shaders that does vertex processing the old-fashioned way (glMatrixMode etc.) and uses a different shader for those nodes that don’t need any special processing. Still, I recommend staying away from Intel Atom graphics. There is way better hardware out there at the same price point.

Speeding up Rendering

libavg’s rendering has been fast enough for many applications for a while. A decent desktop computer could render between 2000 and 5000 nodes with a framerate of 60 in version 1.7. This is probably already more than most frameworks, but for big applications, it’s not enough. For instance, someone tried to build a game of life application with one node per grid point – and ran into performance issues. SimMed spends an inordinate amout of time rendering 2D as well. Also, particle animations and similar effects need lots of nodes.

So, I went and optimized the rendering pipeline. As a bonus, I was able to remove lots of deprecated OpenGL function usage, thus getting us a lot closer to mobile device support.

tl;dr: On a desktop system with a good graphics card, the benchmarks now show libavg rendering two or three times as many nodes as before.

The new rendering pipeline

One mantra that’s often repeated when optimizing graphics pipelines is “minimize state changes” (See Tom Forsyths blog entry on Renderstate change costs and NVidias GDC talk slides). Pavel Mayer once (over-)simplified this to “minimize the number of GL calls”, and my experience has been that that’s actually a very good starting point.

Today’s graphics cards are optimized for large, complex 3D models with comparatively few textures. 2D applications rendered using 3D graphics cards render lots of small primitives – mostly rectangles – with different textures. A naive implementation uses one vertex buffer per primitive. That results in a huge number of state changes and is about the worst way to use current graphics cards.

The new rendering pipeline makes the most of the situation by:

  • Putting all vertex coordinates into one big vertex buffer. This vertex buffer is uploaded once per frame, activated and used for all rendering. The one big upload takes less time than actually figuring out what needs to be uploaded and doing the work piecewise.
  • Using one standard shader for all nodes. This shader handles color space transforms, brightness/contrast/gamma and masks, meaning it does a lot more work than is necessary for most nodes. However, the shader never changes during the main rendering pass. It turns out that the increased per-pixel processing is no problem for all but the slowest GPUs, while the state changes that would otherwise be needed cost signficant time on the CPU side.
  • FX nodes are rendered to textures in a prerender pass with their own shaders.
  • Generally moving GL state changes outside of the render loop if possible and substituting shader parameters for old-style GL state.
  • Caching all other GL state changes. There are just a few GL state variables that still change during rendering (To be precise: glBlendColor, the active blend function, and parameters to the standard shader). Now, setting a shader parameter to the same value repeatedly doesn’t cause several GL calls.

There were also a few non-graphics related optimizations – profiling information is now only collected if profiling is turned on, for example.

Results

Without further ado, here are some benchmarks using avg_checkspeed and avg_checkpolygonspeed. They show nodes per Frame at 60 FPS on a typical desktop system (Core i7 920 Bloomfield, 2.66 MHz, NVidia GF260):

Desktop, Linux (Ubuntu 12.04, Kernel 3.2)

libavg Version Images Polygons
1.7 2200 3500
Current 7000 7000

Desktop, Win 7

libavg Version Images Polygons
1.7   2700 5000
Current 10000 9500

On my MacBook Pro (Mid-2010, Core i7 Penryn, 2.66 MHz, NVidia GF330M graphics, Snow Leopard), the maximum number of nodes rendered did not increase. However, the CPU load while rendering went down – so we have a GPU bottleneck here:

MacBook Pro

libavg Version Images Polygons
1.7 1000, 100% CPU load 1600, 100% CPU load
Current 1000, 80% CPU load 1600, 40% CPU load

More precisely, since changing multisampling settings has an effect on speed, fragment processing is the bottleneck. Changing to minimal shaders doesn’t have an effect on speed either, so I’m guessing at texture fetches at the moment. But that’s for the next iteration of optimizations.