“They’re dials! Just simple dials! How could they mess them up so badly?” Believe it or not, but it’s something we’ve cried out many times here at Imagination.
If you’re wondering what on earth I’m going on about, which is reasonable, let me back up and provide some context.
What we’re talking about are the dials on the digital dashboards that increasingly to be found in modern cars. Having first turned up in the 1980s, these digital dashboards have come back into fashion in a big way, and with good reason. A digital dashboard can provide information in a more precise, more accurate way than a conventional dial while offering greater clarity. They can be adaptive and dynamic, showing exactly what the driver needs to see at any moment, and if the manufacturer allows, they can be customised to suit a driver’s personal preferences. They also can look really, really, cool, which counts for a lot these days.
Just like any other digital displays, these dashboards require GPUs to power them and as a major player in the automotive space, Imagination provides solutions that are already widely used in the industry.
When putting a new car together, the OEM will test their digital dashboard designs with the latest GPUs – which brings us back to our problem – dials. The fact is that these dashboards are often very poorly designed in terms of geometry, and the dials are one of the worst offenders.
Let me explain. As you would expect these dials need to be completely round. In fact, anything less round than round is considered a crime against good taste in the dashboard rendering community, and certainly would not be acceptable in the automotive industry.
However, this roundness is problematic because the way that these are created is generally using incredibly dense meshes, which leads to poor performance on the embedded GPUs that are used in the car.
This is a problem, as to cope with this the GPU is likely to be over-specified, requiring far more horsepower than would otherwise be needed were the dials simply designed sensibly in the first place.
How does this happen? Well, my understanding of how things go is as follows:
Essentially, simply adding more triangles is the easiest solution to the roundness problem in terms of authoring the assets and also in terms of integrating them into an existing renderer. It’s little surprise therefore that we see it done so often. However, as explained above, once the application gets deployed, poor performance inevitably ensues and it’s the hardware that gets blamed.
See below – the dial on the left is round – but not perfectly so, which means that it needs to be made more round (see cartoon above). Great! Done. So, let’s have a look at the newly created dial:
Looking at the wireframe on the right, we can see that it is made from a very high number of triangles (over 10K triangles in total). But the issue here is not actually the high triangle count itself – even low-end GPUs can process an order of magnitude more than these at a decent speed.
No, the real issue here is density: lots and lots of triangles packed together in an area only a few pixels wide. GPUs were designed to accelerate the rendering of relatively few triangles spanning relatively large amounts of pixels – but what we have here is the opposite, and that leads to serious performance losses.
Not only will the dial be slow to render, it will also be very aliased. Every edge, indent, or bevel will end up showing pixel artefacts. In such situations, the only reasonable solution is to use Multi-Sample Anti-Aliasing (MSAA) to clean things up, which will make the renderer even slower.
Moving beyond geometry
The good news is that a dial is actually a pretty simple piece of geometry: it’s mostly flat and is usually viewed from a single angle. It can, therefore, be easily approximated with a simple transparent textured quad, essentially pre-computing all the geometry and lighting into something the GPU can handle very easily. The added bonus is that the texture can easily be tweaked to feather the edges for extra smoothness, and thus remove completely the need for MSAA.
This baked texture on transparent quad approach is pictured above. Although this is extremely cheap to render, there are a few potential issues that may have to be solved, depending on the quality the designer is hoping to achieve:
- The whole object has to use transparency (which is generally not recommended) even though only a small portion actually needs it.
- The whole area around the dial is fully transparent. Transparent pixels still get rasterised and have their shader executed, which is a bit wasteful.
- When zooming in, texels are visible. This might be acceptable in some situations.
- Lighting is completely baked in, which can be problematic if anything needs to be animated.
We can start by reducing the number of wasted pixels by making our shape a bit rounder. This is essentially a trade-off between the pixel and vertex count: a more tessellated disc will match the original shape more closely and will require less transparency. In practice, there’s no need to go very high in terms of vertex count and a dozen sides is already good enough.
In this case, the amount of geometry is still very reasonable, and we have successfully removed almost all wasted transparent space.
Now, since only the outer edge of the dial needs transparency, we can split the object in two: with the central disc being opaque, and the outer ring transparent. This way, we get to keep the nice soft feathering on the edges without having to use alpha blending on the whole object. It does, however, mean that we have to submit more draw calls and render more triangles.
So far, we managed to get a nice and clean render, but the lighting being static means that as soon either the light or geometry get animated, the illusion of surface details will be completely shattered. Depending on the style the designer is hoping to achieve, this can be acceptable, and in such case, the rest of this section would be irrelevant.
If a more dynamic lighting environment is needed, this rendering trick will be worth knowing: it is possible to simulate small surface details without having to resort to actual geometry. The most popular method is normal mapping. With normal mapping, a texture is used to distort the surface normal (i.e. the direction that is used when computing lighting) on a per-texel basis.
In a normal map, the colours indicate how the geometric normal must be modified in the three axes: the red channel controls modifies the normal laterally, the green channel handles the vertical, and the blue channel modifies the outwards projection. A good way to make sense of the texture visually is to keep in mind that the bluest parts are the ones that will show the least amount of change.
Most 3D authoring packages are capable of generating such textures fairly easily from a pair of low and high-density meshes.
Using normal maps requires some changes in the renderer. First of all, the geometric data must now also include a per-vertex tangent space, which is a set of three directions (one of them being the normal). The tangent space is the 3D space the normal map refers to – when the normal map indicates that the normal must be skewed left or right, the tangent space converts it to a real 3D direction. This tangent space has to be processed in the vertex shader, passed on to the pixel shader, and finally used to decode the normal map, which means an increase in bandwidth and processing cost.
We now have a piece of flat geometry that will react to lights the same way as the original mesh. Quick quality check: can you tell which one is the original and which one is normal mapped?
One is normal mapped and one isn’t. Can you tell which is which?
The answer is actually obvious, with the original one being on the right, showing some clear aliasing artefacts.
With some understanding of normal mapping and tangent spaces, we can take things a little bit further and improve our texture usage. Basically, instead of baking the whole shape into a texture, we only extract the normal map for a wedge and have it replicated circularly. Because the normal map is relative to a tangent space, and because the tangent space gets rotated with its geometry, the maths holds up and everything works as expected.
The number of triangles is once again entirely up to you: thinner triangles means more of them but less texture waste.
With this technique, things do unfortunately get a bit messy towards the centre making the triangles pretty obvious at certain angles:
I’m not entirely sure what the root cause is but it’s likely to be a texture precision and filtering issue: as we move towards the central part, the number of texels spanning the wedge gets very small, and the tiniest issue with even one of them will have a serious impact. Most normal map bakers can add padding to the texture, which somewhat reduces the issue, but this didn’t do the trick in this case. The pragmatic solution to this is to add a small cap in the middle that will be textured independently and call it a day.
Looking at the texture setup, it’s pretty clear that this method saves a lot of texture space:
Visually, the detail quality is drastically improved, with the wedge method remaining smooth and artefact-free even when zooming in:
By removing the need for mostly redundant data, we have successfully increased texel density and reduced the actual texture resolution at the same time. Not bad.
After so many iterations, maybe now is a good time to have a quick reminder of how far we’ve come. We used to have a dense mesh producing aliasing that no reasonable amount of MSAA could clean up, and we now have a very small mesh combined with a fairly small texture that is pretty much guaranteed to look good at any scale without the need for any MSAA.
But what about performance? We may have removed geometry, but we have added a lot of complexity everywhere else: we now have to draw more objects, use alpha blending, and run more complex fragment shaders.
To measure performance, I have made a somewhat artificial benchmark displaying multiple dials being lit by a single dynamic light source. It was then deployed and run on the device with the smallest GPU I could find (a PowerVR GE8300), and the performance data was retrieved and analysed using PVRTune.
Pro tip: if your wireframe looks like a solid render, you have way too many triangles.
The following graph shows how the GPU’s time is being spent when rendering our benchmark. The blocks below it represents tasks, colour coded by frame number. The geometry tasks process the geometry, which includes running the vertex shaders, culling back-facing or off-screen triangles, etc. The rendering tasks represent the work that goes on at a pixel level, such as rasterising the triangles and running pixel shaders.
As expected, a fair amount of time is spent processing the geometry, but it is still an acceptable workload even for a mobile GPU. What is more surprising is the amount of time needed to turn that geometry into pixels: 19 milliseconds. We can all agree that this is high when your time budget for a whole frame is around 16ms. The pixel shader used here is trivial and should be absolutely no problem for any GPU, so what exactly is taking so long?
Well, as mentioned earlier, rasterisers do not handle long thin triangles well at all and that can be seen on the graph above: the pixel processing load counter represents the portion of pixel processing time spent running the pixel shader, which is expected to be close to 100% in normal circumstances. However, in this case, it doesn’t even reach 60%, meaning that the GPU is stuck doing something else, most likely struggling to rasterise our terribly-shaped geometry.
And with 4xMSAA enabled, things get even worse, with the application dropping from 50 to 30 frames per second (fps). Now 30fps for a clean full HD render is not necessarily terrible in itself and is actually pretty standard for many 3D applications and games, but I suspect (and hope) that this would not be the case for such a critical automotive component.
So let’s now move on to the optimised dial renderer. Visually, no big surprise: the wireframe looks reasonable and the final frame looks perfect.
But in the figure below, we can see how it now runs at a full V-synced 60 frames per seconds, meaning that there is even a lot of room for extra features and content in the application. This is quite an improvement!
It is interesting to note that although we have given more work to the pixel shader the rendering task now completes a lot quicker. This is due to several factors. The general idea is that instead of focusing all the processing onto a single part of the GPU that will end up completely overworked, we spread the work across different specialised units. Furthermore, the more reasonable mesh density no longer causes inefficiencies in the rasteriser, which can be seen in the pixel processing load counter now reaching almost 100%.
What we’ve shown is that while mobile GPUs have come a long way in terms of performance, we haven’t quite reached the point where you can just brute force everything. We’ve demonstrated how designing in a more subtle and clever way doesn’t necessarily have to be overwhelmingly complicated. It’s true that it does require a bit more direction, as both the engineers and artists must get a bit more involved and collaborate to a larger extent but the benefits can be substantial. In our example, we’ve gone from an application that struggles to render at an acceptable framerate, to something that can render images a lot faster than the device can display them, and even produces images that are a lot cleaner.
The upshot is that even our modest sized mobile GPUs are more than capable of running the latest, sleekest-looking digital dashboards efficiently at very smooth frame rates. And most importantly, with perfectly round dials.