This blog post is a must-read for every Unreal Engine 4 developer that wants to learn about how to optimise for PowerVR platforms specifically, as well as mobile in general. We’ve collected the most relevant tips from our very useful PowerVR Performance Recommendations document to share with you to help you boost your Unreal performance.
For the purposes of this blog post, we used Unreal Engine 4 version 16.3 for analysis. We also used the minimal starter project with mobile scope and maximum quality. It is important to select the mobile version to guarantee that the mobile optimised materials and effects will be used.
The Rendering Settings screen is the central hub for all the performance-related settings. However, it is important to note that the global “post process volume” and other local settings seem to override it.
Unreal Engine 4 allows for a great number of options to scale performance. Generally, these options reduce rendering quality but some options might not be needed at all in certain use cases.
There are several useful mobile settings. One of these settings is High Dynamic Range (HDR) lighting. HDR enables the lighting system to display a wide intensity range such as the sun and a desk lamp in the same dynamic scene. If this option is disabled, high-intensity lighting might seem flat or overexposed. Therefore if your application does not have very intensive lighting, or the lighting intensity range is quite uniform then you might get away with disabling HDR rendering. Note that disabling HDR also makes post-processing effects unavailable.
Cascaded Shadow Mapping (CSM) works by using multiple shadow maps along the camera’s axis to make sure that large scenes can get high-resolution shadows. However, if a scene is not very large, it might be a good idea to turn down the number of CSM cascades to a level that is still acceptable.
Vertex fogging calculates the amount of light per vertex obscured by fog at a certain distance. If your game has no fog or very little fog, it’s recommended to turn this feature off as it saves some cycles in the vertex shaders.
Multi-Sampling Anti-Aliasing (MSAA)
Unreal Engine 4 provides a dedicated setting for mobile MSAA. MSAA will not be applied to post-processing surfaces so its cost should not be very high. While 2x MSAA costs less than 4x MSAA, its quality is not as good and 8x MSAA is overkill for the quality it provides. Therefore, usually the choice of MSAA settings is between no MSAA and 4x MSAA. On low-end hardware, it is recommended to disable MSAA completely as it might be prohibitive on those devices because MSAA performance also depends on the selected resolution and visible triangle count.
You can also choose from a variety of alternative anti-aliasing methods such as Fast Approximate Anti-Aliasing (FXAA) and Temporal Anti-Aliasing (TAA). Since FXAA and TAA are post-process filters, they come with a high base cost that might not be affordable if the application is already pixel processing limited. FXAA is an approximate spatial anti-aliasing solution that looks for edges in the image based on perceived colour difference and filters the pixels based on the edges. TAA is a temporal anti-aliasing technique that applies sub-pixel jitter to vertices so that multiple samples can be gathered over time (temporally). TAA has minimal ghosting artefacts due to ghosting correction based on motion vectors and reprojection.
On mobile platforms, texture resolution does not need to be that high for surfaces to look good. The same applies to “reflection capture” resolutions, so make sure you select a resolution that still looks good enough for your game. This way the amount of work in the GPU can be reduced without severely affecting the visual quality.
On tile-based mobile architectures in general, it is very important that a render target is cleared before it is rendered onto. This is necessary because if the render target is not cleared, the GPU has to load in the contents of the previous render target from external memory which is an expensive operation. Therefore make sure that you always have “Hardware clear” selected for the clear scene option.
Particle effects could have a very high fillrate cost if their size is large. One option to reduce the fillrate is to cut out the empty parts of a particle texture with a bounding mesh and therefore skip rendering empty parts of the particle.
Unreal Engine 4 on mobile always uses forward shading. One of the more general optimisations for forward shading is to lower the number of lights allowed per pixel. The number of lights allowed will directly affect the GPU workload, as each object in the scene has to be rendered every time a light touches it. This means one pass per light for each mesh, capped by the maximum number of lights affecting a mesh set in the “Max Movable Point Lights” option. If the number of lights affecting a mesh exceeds the cap, only the most important lights are rendered for each mesh. You might be able to increase the total number of lights active at any given time if the lights are evenly distributed and don’t overlap.
Disabling Pre-Z Pass
Unreal Engine 4 provides a handy optimisation for desktop platforms called “Early Z Pass”. This works by saving the scene distance for each pixel in a pre-pass, and then when the actual rendering happens hidden primitives can be skipped. This vastly reduces overdraw.
On PowerVR, which is a Tile Based Deferred Rendering (TBDR) architecture, a depth pre-pass is a very counter-productive thing to do, as the GPU’s Hidden Surface Removal (HSR) is designed to reject occluded pixels anyway. Enabling “Early Z Pass” will cause redundant work by performing depth testing twice and saving the depth buffer to memory.
To make sure Unreal does not do this, make sure “Occlusion Culling” is disabled, “DBuffer Decals” are disabled and “Early Z Pass” is set to None. All three need to be disabled, as DBuffer decals and occlusion culling force the Pre-Z pass on. The following images show how to do this:
Disabling Occlusion Culling
Disabling DBuffer Decals
Disabling Early Z-pass
Android Platform Settings
Always make sure you use texture compression. This not only saves space but also saves bandwidth at runtime. This is one of the best ways to increase performance and save battery life. The advantage of compressed textures is that they will stay compressed until the very moment they are needed to process a fragment.
Unreal Engine 4 supports a variety of texture compression methods. By default Unreal selects the texture compression method that is available on the target platform and has the highest priority; however, you can override it as in the screenshot below:
The compression format options are as follows:
- ETC1 is a texture compression format supported on all devices. It is superseded by ETC2 in terms of quality and size. While ETC1 is simple and has widespread support, it doesn’t support alpha channel and the compression rate is not that great.
- PVRTC is a texture compression format supported exclusively by PowerVR hardware. It supports alpha channel, has one of the best size to quality ratios and is highly configurable to match your quality/size needs.
- ASTC is an open format supported by most platforms. It supports alpha channel, has comparable compression rate to PVRTC and similar configurability.
- DXT is a compression format supported widely on desktop. On mobile, due to licencing issues, it is only supported by the Nvidia Tegra devices.
- ATC is a texture compression format supported only on Qualcomm Adreno devices.
You can override the priority numbers as seen here:
You can also set PVRTC texture compression quality in the engine cooker section as shown:
The “fastest” setting refers to the speed of compression. Choosing higher compression quality could make the compressor take a very long time to process the texture.
Changing rendering API
The latest Android devices (including PowerVR) can utilise the Vulkan API. Vulkan has huge benefits as it gives you the opportunity to reduce the CPU load and gives you more control over synchronisation. It is also great to utilise multi-core CPUs as it allows for multi-threaded command submission to the GPU.
When selecting APIs to use, Unreal will try to use the most advanced one of those you selected. You can choose not to support devices that don’t support Vulkan by only leaving Vulkan checked. Here’s how you set it:
Device profiles are a really good way to have fine-grained control over rendering settings for each target device. This allows a developer to fine tune Unreal Engine 4’s features for a specific device. It’s a great way to handle many target devices.
Device profiles are hierarchical, so Android will apply to all Android devices, but specific per SoC settings can be specified as well in separate device profiles.
Visualisation methods and profiling tools
Unreal Engine 4 provides many tools to help developers understand what is happening under the hood. One of the best tools is the GPU Visualizer. This allows a developer to see exactly what effects are rendered and how long they take to render on desktop. This means that developers can pick which effects they want to leave on and which they choose to turn off.
There are other important views for optimisation.
Shader complexity view
The Shader Complexity view mode colours the scene according to the complexity of the shaders used in the materials. This helps developers identify materials that are too costly relative to the rest of the scene.
Mesh LOD view
Another important view is the mesh Level of Detail (LOD). This view colours the scene according to the complexity of the mesh LODs displayed on screen. This helps developers identify meshes that have too dense geometry compared to the rest of the scene.
Per mesh settings
Mesh Level of Detail (LOD) is a fantastic tool to manage geometry complexity. You can use it to swap out detailed geometry to less detailed as the camera moves further away from the given object. This way the amount of geometry on the screen is never more than necessary, and quality is always sufficient.
Note that mesh LODs are a great way to instantly optimise geometry content for mobile. Just set a higher LOD bias so that lower resolution meshes are used by default. Reducing the geometry workload helps to reduce the amount of computation needed, resulting in potentially cooler devices and longer battery life.
Unreal Engine 4 allows for a number of customisation options. First of all, developers can set the number of different LOD levels per mesh, the percentage of triangles that should be present in each LOD level. After this, the LOD levels are automatically generated by Unreal and are automatically swapped in and out whenever appropriate.
Per each material
Amongst several other options, blending mode can be chosen for each material. By default, materials use the opaque blending mode which allows PowerVR hardware to take advantage of its Hidden Surface Removal (HSR) capabilities. This enables PowerVR devices to have zero overdraw for opaque primitives, which saves tremendous computation time and bandwidth.
For transparent surfaces, there are two important blending modes. The translucent mode uses the alpha values of the colour textures to determine how translucent a mesh is at a given point. The meshes are rendered back to front and composited on top of each other. This works fine on PowerVR.
On the other hand, masked materials use a cutoff value to discard pixels that are not opaque enough. This mode works better with a number of post-process effects that rely on depth values. However, this operation might have a higher cost than standard translucency on any architecture that uses early depth test optimisations.
On PowerVR masked mode (alpha tested/discard) primitives cannot write data to the depth buffer until the fragment shader has been executed and the fragment visibility is known. These deferred depth writes can impact performance, as subsequent primitives cannot be processed until the depth buffers are updated with the alpha tested primitive’s values.
Therefore, we recommend using the “Translucent” mode wherever possible instead of the “Masked” mode.
While all mobile architectures are great at half-precision computation, PowerVR is exceptionally good at it, so it makes sense to use it whenever possible. Using half-precision (FP16) in shaders can result in a significant improvement in performance over high precision (FP32). This is due to the dedicated FP16 Sum Of Products (SOP) arithmetic pipeline, which can perform two SOP operations in parallel per cycle, theoretically doubling the throughput of floating point operations. The FP16 SOP pipeline is available on most PowerVR Rogue graphics cores.
Just as there’s a LOD solution for meshes (mesh LOD), there’s a LOD solution for textures called mipmapping. Mipmapping works by automatically lowering the displayed texture resolution based on the size of the covered area. This can significantly improve cache efficiency, increase performance and reduce bandwidth. Unreal usually automatically sets up mipmaps for you and it gives you the ability to set the LOD bias to control the texture quality. One thing to keep in mind when using mipmaps is that they should be only used for 3D elements in the scene. For 2D elements such as UI that are mapped 1:1 to the screen, they are unnecessary. If they are scaled, you will need mipmaps.
To ensure transitions between mipmap levels are seamless, make sure you are using trilinear filtering on your mipmapped assets as shown below. “Trilinear” might add a little performance overhead but it is usually worth using to avoid these artefacts.
Content optimisation for mobile
Mobile devices come with very particular constraints as they need to function on battery for at least a day and keep cool enough in the hand of the user. This means that when porting a desktop game to mobile, power has to be considered along with performance when it comes to optimisations.
First and foremost, geometry complexity needs to be optimised. On desktop, the usual geometry count today can be two to three million polygons visible on screen, yet on mobile, this number is more like two to three hundred thousand polygons. After optimising the polygon count, the developer also needs to profile and verify that the vertex shaders are not overly complex.
Next, texture resolution and bandwidth usage, for example, post process effects, needs to be adjusted to accommodate for mobile devices. On the desktop, the GPU memory bandwidth is two to three hundred billion bytes/second, on mobile available memory bandwidth shared between the CPU and GPU is twenty to thirty billion bytes/second. While this means that texture resolution needs to be potentially halved, you also need to take into consideration that mobile screens are much smaller (20+ inches on desktop, five inches on mobile) so smaller textures are usually still sufficient. Finally, the adjustment profiling needs to be done to verify the results.
As you can see, Unreal Engine 4 provides a wide range of options to optimise your games for mobile. There are a number of PowerVR specific options that you can tweak to make sure performance is optimal on our platforms as well as general optimisations for fine-tuning content.