In an article published earlier this week, I described how developers can implement fast, ray traced soft shadows in a game engine running on our PowerVR Wizard architecture.

Today I’d like to present some very exciting results which demonstrate how ray tracing delivers significant reductions in memory bandwidth and power consumption over traditional rasterized methods (i.e. cascaded shadow maps).

Complete results: cascaded shadow maps vs. ray traced shadows

The images below compare an implementation of four slice cascaded shadow maps at 2K resolution versus ray traced shadows. In the ray traced case, we retain the shadow definition and accuracy where the distance between the shadow casting object and the shadow receiver is small; by contrast, cascaded shadow maps often overblur, ruining shadow detail.

Clicking on the full resolution images reveals the severe loss of image quality that occurs in cascaded shadow maps.

PowerVR Ray Tracing - cascaded vs ray traced-3

PowerVR Ray Tracing - cascaded vs ray traced-1

PowerVR Ray Tracing - cascaded vs ray traced-2

In the second and third examples, we’ve removed the textures so we can highlight the shadowing.

Optimizing the ray tracing algorithms

The diagram below describes the initial implementation of the ray tracing hybrid model described in this article.

PowerVR Ray Tracing - rendering pipeline-1f

The first optimization we can make is to cast fewer rays. We can use the information provided by dot (N, L) to establish if a surface is back facing a ray. If the dot (N, L) result is less or equal to 0, we don’t need to cast any rays because we can assume the pixel is shadowed by virtue of facing away from the light.

PowerVR Ray Tracing - rendering pipeline-2f

Looking at the rendering pipeline, there are further optimizations we can make. The diagram below shows the standard deferred rendering approach; this approach involves many read and write operations and costs bandwidth (and therefore power).

The first optimization we’ve made is to reduce the amount of data in each buffer by using data types that don’t have any more bits than the bare minimum needed; for example, we can pack our distance density buffer into only 8 bits by normalizing the distance value between 0 and 1 since it doesn’t require very high precision. The next step is to collapse passes; if we use the framebuffer fetch extension, we can collapse the ray tracing and G-Buffer into one pass, saving all of the bandwidth of reading the gBuffer from the ray emission pass.

PowerVR Ray Tracing - rendering pipeline-3f

Memory bandwidth usage analysis

Before we look at the final numbers, let’s spend some time looking at memory traffic. Bandwidth is the amount of data that is accessed from external memory; memory traffic consumes bandwidth. Every time a developer codes a texture fetch, the shading cluster (USC) in a PowerVR Rogue GPU will look for it inside the cache memory; if the texture is not stored locally in cache, the USC will access DRAM memory to get the value. For every access to external memory, the chip will incur significant latency and the device will consume more power. When optimizing a mobile application, the goal of a developer is to always minimize the accesses to memory.

PowerVR Ray Tracing - bandwidth

By using specialized instruments to look at bandwidth usage, we can compare cascaded shadow maps with ray traced soft shadows on a PowerVR Wizard GPU. In total, the cascaded shadow maps implementation consumes about 233 MB of memory while the same scene rendered with ray traced soft shadows requires only 164 MB. For ray tracing, there is an initial one-time setup cost of 61 MB due to the acceleration structure that must be built for the scene.

This structure can be reused from frame to frame, so it isn’t part of the totals for a single frame. We’ve also measured the G-Buffer independently to see how much of our total cost results from this pass.

PowerVR Ray Tracing - efficiency analysis

Therefore, by subtracting the G-Buffer value from the total memory traffic value, shadowing using cascaded maps requires 136 MB while ray tracing is only 67 MB, a 50% reduction in memory traffic.

PowerVR Ray Tracing - efficiency analysis-2

We notice similar effects in other views of the scene depending on how many rays we are able to reject, how much filtering we have to perform. Overall, we get an average of 50% reduction in memory traffic using ray traced shadows.

PowerVR Ray Tracing - efficiency analysis-3

Looking at total cycle counts, the picture is even better; we see an impressive speed boost from the ray traced shadows. Because the different rendering passes are pipelined in both apps (i.e. the ray traced shadows app and the cascaded shadow maps app) we are unable to separate how many clocks are used for which pass. This is because portions of the GPU are busy executing work for multiple passes at the same time.

Using ray tracing to implement soft shadows leads to a 50% speedup on a PowerVR Wizard GPU

However, the switch to ray traced shadows resulted in a doubling of the performance for the entire frame!

We hope you’ve enjoyed our two articles about ray tracing and PowerVR Wizard GPUs; we look forward to sharing some exciting news and real-world demonstrations in the near future – stay tuned to our blog and social media accounts for more details coming soon!

Additional resources on PowerVR Ray Tracing

For those interested in finding out more information on our PowerVR Ray Tracing technology, here is a selection of available resources from our archives:

Make sure you follow us on Twitter (@ImaginationPR, @PowerVRInsider, @PowerVR_RT) to get the latest news and announcements for the PowerVR ecosystem.

A big thank you to Justin DeCell for his huge contribution on developing the analytical soft shadows technique and for his help in making this happen.