Physically based rendering (PBR) is becoming more prevalent on mobile class GPUs. In this blog post I will give a quick overview of what PBR is, what the advantages and disadvantages of using it are, and some tips on how to use PBR and deferred rendering when running on a PowerVR GPU. I’ll also show you how we used PBR in Dwarf Hall, a recent OpenGL ES demo we produced.
Physically based rendering has no precise definition, it is really a set of guidelines to follow to achieve a more scientific and intuitive way of rendering a scene. A good example is specular lighting. Before PBR, many graphical applications used ad-hoc inputs such as a shininess value as material parameters. This value would be in the range of zero to infinity and artists would be required to view several different values to see which one looks the best.
This method is not intuitive and produced varying levels of quality. It also could produce results that were physically impossible, i.e. more light would be output than was input. PBR rendering makes input values for materials more understandable to both artists and programmers.
Image based lighting
The specular contribution in a scene is an important factor and one way to store the specular contributions for a point in space for a scene (irradiance) is by using a cubemap. An offline renderer can be used to create these cubemaps and we merely need to look up a direction in that cubemap when shading a surface in the application at runtime.
In our demos we use a ray tracer to create the offline cubemaps from multiple points in space in the demo. We also use the mip maps of the cubemap to represent the blurry, glossy effects that rough surfaces have (Prefiltered Mipmaped Radiance Environment Maps). We used the method that Sébastien Lagarde used in his blog post here.
This process of encoding the irradiance at a point in space is called Image based lighting and is commonly combined with PBR. We also encode these irradiance values in HDR for use with post processing effects later on and use a tool to store this in an efficient PVRTC format.
As inputs to a PBR system there are usually a handful of parameters. We used the below inputs to our demo:
- Albedo colour
- Roughness (or smoothness)
- Specular colour
- Tangent normals
These inputs are enough to give to the PBR shader to produce good looking results and are intuitive for artists and programmers to understand.
One thing that is difficult to get right in 3D rendering is gamma. There are two main spaces colour can live in: gamma space or linear space. Most monitors or output devices expect a gamma space input. The two things to look out for are banding and incorrect gamma compensation.
Banding happens when we quantise from a higher precision to a lower precision and then do more computations on that result. In this situation we want to stay in higher precision while we do the computations and then quanitise as late as possible to avoid banding.
Input textures are the main area where incorrect gamma compensation can happen. Make sure to know what colour space a texture is stored in and apply the correct gamma compensation to it when sampling the texture. Also make sure that texture filtering takes into account gamma. This can be achieved by using the extension EXT_pvrtc_sRGB when working with efficient PVRTC encoded textures.
In the demo we wanted to have many moving lights and we used deferred rendering to achieve this. We still want the same inputs to our PBR rendering, so we started with the following layout for our G-buffer:
- RGB Light accumulation
- F32 Depth
- RGB Tangent normals
- 0-1 Roughness
- RGB Albedo
- RGB Specular colour
- 0-1 Metalicness
Metallicness vs Specular
It is possible to reduce the number of inputs, in order to reduce the size of the G-buffer, if we use the assumption that all metals will have the same specular colour and all dielectrics use the same diffuse colour. Meaning that the albedo colour of a material and the specular colour can be stored in the same image. This is the “Metalicness” workflow we use below.
This gives us a final G-buffer layout that looks like the following:
- rgb10a2: Light accumulation
- r32f: Depth
- rgba8: Normals & roughness
- rgba8: (Albedo or reflectance) & metallicness
This fits nicely into the 128 bits of on-chip memory we have available per pixel for our PowerVR Series6 target device. (Note that 256 bits are available in PowerVR Series6XT and later GPUs.)
Because we have the light accumulation information in HDR format, we can employ many different post processing effects as a last step before drawing to the back buffer. In the demo we have used: bloom, tone mapping, lens flare and a film grain effect. Many physically based renderers use post processing to tune the final image to an artists taste.
In a traditional deferred rendering system, the GPU will read in the G-buffer from main memory and then write out the light accumulation to another render target in main memory. This works well for desktop GPUs with many watts worth of power available and large fast dedicated, power hungry DDR RAM.
However for power constrained devices this becomes a problem. The problem is that a mobile device may only have access to slow system memory for power reasons. So we want to avoid using main memory bandwidth as much as possible.
PowerVR GPUs have a small amount of dedicated, fast local memory because they work in tiles. Either 128 or 256 bits per pixel are available, depending on the hardware. We can utilise this to avoid using main memory bandwidth in a deferred renderer. Some extensions that allow us to do this are:
The framebuffer fetch extension allows us to read back a fragments value that was written from a previous shader. This all happens while these shaders and triangles are being processed inside a tile. So these writes and reads are to local memory so are faster than using system memory. There is a good explanation of this extension here.
Pixel local storage (PLS)
The pixel local storage extension is much like the previous framebuffer fetch extension, but allows us to specify the format of the intermediate variables to store in local memory. This means less format conversions need to happen. We used the following layout in the demo:
layout(rgb10a2) highp vec4 lightAccumulationpadding; layout(r32f) highp float depth; layout(rgba8) highp vec4 normalsroughness; layout(rgba8) highp vec4 albedoOrReflectancemetallicness;
The pixel local storage 2 extension allows us to write to the back buffer instead of to an intermediate buffer. This means that we do not have to render to an intermediate render target living in system memory, then blit this to the back buffer. We could instead write directly to the back buffer. This avoids using more main memory bandwidth.
An analysis of our demo using the PVRTune tool shows that the optimised renderers (Framebuffer Fetch and PLS) execute fewer rendering tasks.
This means that the G-buffer generation, lighting and tonemapping stages are properly merged in to one task. It also shows a clear reduction in memory bandwidth usage between the on-chip and the main memory: a 53% decrease in reads and a 54% decrease in writes.
All these optimisations result in a slightly lower frame time but in much lower power consumption, as shown below. This means longer battery life on mobile devices.
Vulkan allows us to specify the load and store operations explicitly. We have seen very good efficiency results when we ran our Gnome Horde demo using Vulkan and OpenGL ES with regards to the CPU. This means good things for mobile devices – less CPU usage means less power which means longer battery life and more features. I will be looking at combining PLS and Vulkan in a later article.
Image based lighting is an approximation. If a light needs to move in the scene then the offline render of the scene will be incorrect. Recomputing this is a costly expense. If we used ray tracing to generate these IBL probes we could have completely dynamic IBL at a fraction of the cost of recomputing them using rasterisation.
Imagine the low CPU overhead of Vulkan, issuing commands to a power efficient PowerVR GPU using raytracing to achieve fully dynamic lighting and shadows!
This is something we are looking at right now and I hope to give you more information soon.