The real-time graphics pipeline, as modelled by today’s GPUs and its associated stack of drivers, compilers and API, accrues features at a rapid clip. The rate of change in how you program the entire system is rapid, with the ecosystem guardians and the hardware vendors that provide products for them taking advantage of the fact that GPUs have no standard ISA to implement.
The absence of such a standard enables hardware vendors to innovate in the implementation of the design and any new features and lets the software ecosystem take advantage of those innovations and surface them to the programmer, thus advancing the capabilities and efficiencies afforded by the APIs programmers use.
Recently, that innovative problem space has added ray tracing to the box of programmer tricks. With heavy investment in a new hardware architecture that accelerates it, and a big pile of cash spent with game developers to bring real life to that new hardware, Nvidia bootstrapped real-time ray tracing acceleration in a big way. Other companies, including ours, had tried to light the blue touch paper before, but there’s just no substitute to being able to bring enormous resources to bear, in order to convince programmers to take a look.
That application of serious human effort in the form of their research team and developer technology group, and money to help tempt game publishers to take a risk on a fledgling rendering technology that always had promise but never quite got off the ground in the real-time domain on GPUs, mostly solved the ecosystem development problem.
Today, as the industry works on fleshing out the accelerated ray tracing ecosystem on Windows-based PCs, consoles, and eventually, onto other markets in the future, (with bigger GPUs leading the way), there’s a missing piece of the puzzle that’s yet to show up: ray tracing performance analysis software.
The nuances of ray tracing
First, let’s examine why. It’s not often that a new real-time rendering feature requires its own dedicated analysis tooling in order to assess performance. Most new features have a limited footprint in all senses, be that the amount of hardware required to implement them, developer effort to integrate it into a rendering system, size of the driver implementation, what’s needed in any tools, and their impact on the rest of what’s happening in a frame.
Ray tracing is none of those things. A good implementation costs considerable area and effective integration into a rendering system and is only “easy” if you’re using it to implement something contained in scope and complexity, such as shadows. The driver and compiler work to add support for Vulkan Ray Tracing or DirectX Ray Tracing (DXR) is involved and complex, brand new tooling is required, and it has a big impact on your rendering system as a whole.
The inherent complexity of real-time ray tracing acceleration as a feature — that big footprint — means that there’s a lot of scope for implementation having a big effect on performance, and so more than any other new addition to the real-time rendering pipeline in recent memory, ray tracing needs good tools to peek under the covers of each implementation so that developers can learn how to adapt their code.
That’s exacerbated by the fact that there’s only really one implementation on the market today: Nvidia’s Turing. While Nvidia did release a DXR driver for their prior GPU generation, Pascal, it’s really only viable for prototyping and limited experimentation; Pascal lacks any hardware acceleration for DXR or Vulkan Ray Tracing and so it’s really slow!
So, it’s a real risk for developers to only have one implementation on the market to target. Good for Nvidia arguably, since almost all DXR and Vulkan ray tracing code being written today will be running on its GPUs, but not good when developers want to take their games and run it on implementations from other vendors in the future.
They’ll quickly find out that there is a spectrum of how ray tracing can be implemented in a GPU, and understanding the differences is laden with nuance, such as performance cliffs and non-obvious bottlenecks. There’s no substitute to profiling your own workloads in-depth to get the best possible view of course, but there’s most definitely room for software to help you understand how the hardware works and what it’s capable of.
3DMark Port Royal is a real-time ray tracing benchmark but was built using Nvidia’s Turing, presenting a limited view of the nuances of other ray tracing architectures.
That’s spectacularly true when it comes to IP selection for licensable GPUs with ray tracing capabilities. The way companies select IP for their products is by requesting a swathe of performance data from the IP vendor, but in the case of ray tracing what performance data do you ask for? There’s not a huge amount of DXR game content and it’s all targeted at PCs with big Turing GPUs anyway, and there’s absolutely nothing for Vulkan-based ray tracing yet.
In the DXR benchmarking space, there is 3DMark Port Royal from the excellent Futuremark team, which implements an interesting view of how to use DXR in a rendering system, but it was written using Turing (and pre-RTX Pascal!), and thus could only consider ray tracing as a thing to measure in that limited context. So, it’s third-party and independent and exists, but it’s not enough.
There are also no real DXR microbenchmarks to go along with games and something like 3DMark Port Royal. Microbenchmarks test small facets of a semiconductor design in a targeted fashion, and with ray tracing that’s particularly interesting since it’s such a black box with a large surface area that has a big impact on the rest of the GPU and GPU programming model.
Therefore, to help developers, the public and those performing IP selection of a future GPU that includes ray tracing, there’s very little to go on. So, what’s the solution?
Moving beyond peak ray flow
Traditionally a wider range of independent benchmark vendors would cover some of the interesting space of tests for GPUs, but there aren’t many of those left and leaving it to just Futuremark has its own problems. Plus, that would still leave us with the directed microbenchmark problem.
So, we call on the GPU performance analysis community to come and take a close look at the new ray tracing subsystem present in the major graphics APIs in more detail, to allow the entire ecosystem, from developers creating ray tracing workloads to deploy on products, to companies performing IP selection on a ray tracing GPU such as ours, to understand the benefits, pitfalls, gotchas and limitations of available implementations.
There’s a lot more to advertised ray tracing performance than just peak ray flow. Hierarchy traversal performance, parallel tester performance, hierarchy build and refit performance, recursion, inline dispatch, the various shader types — they all need careful observation. It’s not even possible to define what peak ray flow means today since Nvidia doesn’t explain how it measures its number, or even if rays, you know, flow!
That means there’s a real gap in the market for good ray tracing microbenchmarks, along with games, especially those that in the future will use Vulkan Ray Tracing. Every GPU vendor on the planet that’s working on a ray tracing GPU of any kind would welcome good tools to allow for analysis, understanding and comparison of what’s out there, including their own. There’s so much room for interpretation and implementation with ray tracing that it’s almost unique in its need for closer analysis, and we hope someone steps up to provide an independent scientific view.
Other articles on ray tracing you may find of interest include:
Coherency gathering in ray tracing: the benefits of hardware ray tracking
Avoiding cheap tricks: how Imagination is bringing real ray tracing to mobile
Imagination Technologies: the ray tracing pioneers