Back in May, we were proud to announce Furian; the first new graphics architecture from PowerVR since Rogue. A mainstay of the GPU market since its introduction in 2012, Rogue-based designs continue to be performance and power-efficiency leaders across a wide range of handsets used by millions of people around the world. Today, the recently refreshed Series 8XE Plus range offers a wide range of options for entry-level to the mid-range, while the Series7XT continues to make its presence felt in leading-edge chipsets such as MediaTek’s X30, which will be found in devices coming to market very soon.
However, with Furian we are delivering a step-change in performance efficiency, thanks to an architecture that is designed to meet the evolving needs of the market – moving from mobile gaming to VR and AR headsets, from 1080p to 4K with HDR, from photography to next-gen computer vision CNN applications and from running basic infotainment systems in cars to having a crucial role in ADAS functions for autonomous vehicles.
Furian has great features that make it a compelling option for the automotive space, and we’ll be addressing those in a future post. Here though we’ll be focusing on gaming and on some of the specific enhancements that have been made to enable it to push the boundaries of gaming, AR and VR performance.
Furian’s architectural changes
Before we talk about the new Series8XT GT8525 IP core based on Furian let’s look at some key internal Furian architectural improvements.
Just in case there was any doubt, Furian is built on long-standing technologies for which PowerVR has earned its leading-edge reputation. The first is tile-based deferred rendering (TBDR) where we only render what will be seen on screen, offering an inherent advantage over more wasteful architectures that internally render pixels that will never be seen on screen.
The second is that we also have PVRIC3 (PowerVR Image Compression 3). First introduced with our Rogue architecture, PVRIC is our lossless image compression format that typically has a 2:1 compression ratio with no loss of quality and continues to be a critical bandwidth-saving element of our newest architecture.
Simplified and more scalable
What’s crucial about Furian is that it’s designed for scalability. Where the Rogue architecture was very successful with real-world implementations of up to 12 shader clusters, beyond this it could potentially fall prey to overlong signal paths and congested routing. With Furian, the block layout has been altered in order to shorten the signal paths, making for more efficient internal communication and making it easier to increase cluster counts without having to worry about their effect on efficiency. While our approach at Imagination is process-node agnostic, this approach of shortening signal paths ensures the GPU scales well at sub-14nm manufacturing process nodes, with 7nm starting to be adopted at the cutting-edge.
When it comes to scaling up, moving from one Shader Processing Unit (SPU) to two now merely requires the second cluster to be placed down as a mirror image of the first, with the main block repeated, and the system block growing slightly to accommodate it. A three SPU layout would simply add a copy of the first SPU, while a four-SPU would be a mirror image of the two-SPU layout.
We also scale intelligently. Rather just adding an entire additional core to increase performance, only the required blocks are added, ensuring the most efficient use of expensive silicon area.
A look at a block diagram for Furian reveals that the texture unit now has its own cache ensuring that it doesn’t have to compete for access with the Unified Shading Clusters (USCs) for access to the main data cache – boosting power efficiency.
The 2D Data Master and the Compute Data Master are also now fully asynchronous enabling graphics and computer workloads to be worked on at the same time. This will deliver a real-world improvement in AR – where compute could be used for vision processing, while also maintaining fast frame rates for in-game characters.
Another enhancement is that the 2D Data Master can now also talk to parts of the Shader Processing unit directly while bypassing any tiling work, making it much more efficient for things such texture uploads, MIPMap Gen and simple UI BLITs. What does this mean? Essentially it means that fewer blocks are active, which helps to reduce power consumption.
Wider and faster pipelines
A major enhancement are the changes that have been made to the primary ALU pipeline. First, this has doubled in width from 16 to 32 pipelines – delivering higher ALU throughput efficiency with minimal silicon area. In Rogue, the pipelines contained two Multiply Adds (MADs). However, after close analysis of how shaders were actually being used in practice, it became apparent that these two MADs were rarely being fully utilised, as using two together proved difficult due to data dependencies and versus the target algorithms of the shaders/kernels. Therefore, with Furian we instead implemented a MAD and a MUL, which offers more performance in the real world, while requiring approximately the same area as the two MADs present in Rogue. And in scenarios where two MAD operations are actually required, Furian’s double-width pipeline enables it to match Rogue’s performance, so in that sense, nothing has been lost over the previous approach.
The changes don’t end there, with other improvements that improve internal efficiency. First, the workload submissions from the GPU driver to the GPU now no longer have to go through the Linux kernel module; they can now communicate directly from user space, reducing overhead and latency. Secondly, local memory addressing is now optimised for the multi-threaded GPU execution, so each ALU pipeline can directly address local memory, which improves scatter access efficiency for many compute algorithms. The first fruit of all these architectural tweaks in Furian is the first released licensable IP in the form of the two-cluster design PowerVR Series8XT GT8525.
The Series8XT GT8525 offers the following improvements over the previous generation Series7XT Plus:
- 100% fill-rate per clock increase
- Double the pixels per clock (PPC) throughput to 8PPC, compared with 4PPC for the Series7XT Plus, catering for higher resolution displays and additional performance for previously fill-rate limited use cases
- 50% more FLOPS per clock
- More efficiently accessible GFLOPs, enabling easier exploitation of the cores full potential for graphics and compute usage cases
- 66% performance density improvement in GT8525 (2 cluster) compared to the GT7400 Plus (4 cluster)
- Better performance for lower silicon area cost also translating into improved power efficiency
The Series8XT GT8525 is then, a GPU that is truly set up to deliver the performance that the next-gen applications demand, such as augmented reality and machine learning.
These raw performance increases bolster the efficiencies within the PowerVR architecture that make it very suited for VR. These are all aimed at reducing the ‘motion to photon latency’, which you can read about in more detail in this blog post. Furian also supports techniques such as Asynchronous Time Warp rendering, which offers granularity so that the rendering of individual tiles can be interrupted if it is detected that it is not required, bolstering efficiency.
Naturally, the Series8XT GT8525 is designed to fully support the Vulkan is expected to pass the Conformance Testing Process based on a published Khronos specification (check on the Khronos website for current conformance status). This support ensures that developers can reduce the CPU overhead on their games and make as full use of the GPU as possible. This again helps ensure the most efficient use of a SoC, boosting battery life, which is of vital important in AR and VR gaming.
Of course, the Series8XT GT8525 also offers the practical necessities such as support for digital rights management (DRM) necessary to protect high-value content. If you are a content developer and are thinking about developing next-gen content, such as a 3D VR 360° video stream of a football game, you’ll want to know your content can’t be easily ripped off.
PowerVR continues to extend its performance and power efficiency lead by offering optimised solutions for each part of the market. With the Series8XE catering for the entry-level and the Series8XE Plus ideal for the mid-range, the new Furian-based 8XT can meet the demands of the high-end market with its highly streamlined and optimised architecture.
While the current designs in the market, such as Series7XT and Series8XE are able to fulfil the needs of 3D gaming on high-resolution smartphone displays, when it comes to meeting the needs of next-gen experiences such as VR, even more performance is needed, and Furian-based parts, of which the PowerVR Series8XT GT8525 is the first example, are poised to deliver it.