Today I’d like to present some very exciting results which demonstrate how ray tracing delivers significant reductions in memory bandwidth and power consumption over traditional rasterized methods (i.e. cascaded shadow maps).
Complete results: cascaded shadow maps vs. ray traced shadows
The images below compare an implementation of four slice cascaded shadow maps at 2K resolution versus ray traced shadows. In the ray traced case, we retain the shadow definition and accuracy where the distance between the shadow casting object and the shadow receiver is small; by contrast, cascaded shadow maps often overblur, ruining shadow detail.
Clicking on the full resolution images reveals the severe loss of image quality that occurs in cascaded shadow maps.
In the second and third examples, we’ve removed the textures so we can highlight the shadowing.
Optimizing the ray tracing algorithms
The diagram below describes the initial implementation of the ray tracing hybrid model described in this article.
The first optimization we can make is to cast fewer rays. We can use the information provided by dot (N, L) to establish if a surface is back facing a ray. If the dot (N, L) result is less or equal to 0, we don’t need to cast any rays because we can assume the pixel is shadowed by virtue of facing away from the light.
Looking at the rendering pipeline, there are further optimizations we can make. The diagram below shows the standard deferred rendering approach; this approach involves many read and write operations and costs bandwidth (and therefore power).
The first optimization we’ve made is to reduce the amount of data in each buffer by using data types that don’t have any more bits than the bare minimum needed; for example, we can pack our distance density buffer into only 8 bits by normalizing the distance value between 0 and 1 since it doesn’t require very high precision. The next step is to collapse passes; if we use the framebuffer fetch extension, we can collapse the ray tracing and G-Buffer into one pass, saving all of the bandwidth of reading the gBuffer from the ray emission pass.
Memory bandwidth usage analysis
Before we look at the final numbers, let’s spend some time looking at memory traffic. Bandwidth is the amount of data that is accessed from external memory; memory traffic consumes bandwidth. Every time a developer codes a texture fetch, the shading cluster (USC) in a PowerVR Rogue GPU will look for it inside the cache memory; if the texture is not stored locally in cache, the USC will access DRAM memory to get the value. For every access to external memory, the chip will incur significant latency and the device will consume more power. When optimizing a mobile application, the goal of a developer is to always minimize the accesses to memory.
By using specialized instruments to look at bandwidth usage, we can compare cascaded shadow maps with ray traced soft shadows on a PowerVR Wizard GPU. In total, the cascaded shadow maps implementation consumes about 233 MB of memory while the same scene rendered with ray traced soft shadows requires only 164 MB. For ray tracing, there is an initial one-time setup cost of 61 MB due to the acceleration structure that must be built for the scene.
This structure can be reused from frame to frame, so it isn’t part of the totals for a single frame. We’ve also measured the G-Buffer independently to see how much of our total cost results from this pass.
Therefore, by subtracting the G-Buffer value from the total memory traffic value, shadowing using cascaded maps requires 136 MB while ray tracing is only 67 MB, a 50% reduction in memory traffic.
We notice similar effects in other views of the scene depending on how many rays we are able to reject, how much filtering we have to perform. Overall, we get an average of 50% reduction in memory traffic using ray traced shadows.
Looking at total cycle counts, the picture is even better; we see an impressive speed boost from the ray traced shadows. Because the different rendering passes are pipelined in both apps (i.e. the ray traced shadows app and the cascaded shadow maps app) we are unable to separate how many clocks are used for which pass. This is because portions of the GPU are busy executing work for multiple passes at the same time.
Using ray tracing to implement soft shadows leads to a 50% speedup on a PowerVR Wizard GPU
However, the switch to ray traced shadows resulted in a doubling of the performance for the entire frame!
We hope you’ve enjoyed our two articles about ray tracing and PowerVR Wizard GPUs; we look forward to sharing some exciting news and real-world demonstrations in the near future – stay tuned to our blog and social media accounts for more details coming soon!
Additional resources on PowerVR Ray Tracing
For those interested in finding out more information on our PowerVR Ray Tracing technology, here is a selection of available resources from our archives:
- An introduction of ray tracing and the fundamentals of our PowerVR Wizard architecture (Ray tracing made easy)
- A blog article about the PowerVR GR6500 ray tracing GPU (PowerVR GR6500: ray tracing is the future… and the future is now) and an in-depth article detailing how ray tracing works (Ray tracing: the future is now)
- A featured post on Gamasutra describing practical techniques for ray tracing in games (Practical techniques for ray tracing in games)
- A guide on how we’ve implemented hybrid rendering in an experimental version of the Unity game engine (Implementing hybrid ray tracing in a rasterized game engine)
- A short preview of the PowerVR Ray Tracing-enabled Unity lightmap editor (PowerVR Ray Tracing behind the lightmaps in the Unity editor)
- The story of how we taped out our PowerVR GR6500 ray tracing GPU in a test chip (Award-winning PowerVR GR6500 ray tracing GPU tapes out)
A big thank you to Justin DeCell for his huge contribution on developing the analytical soft shadows technique and for his help in making this happen.