Micro-benchmark your render on PowerVR Series5, Series5XT and Series6 GPUs

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

Benchmarking the performance of applications running on PowerVR GPUs isn’t as simple as collecting time stamps as various points in your render. The reason for this is that the graphics driver submits work to the GPU independently from the API calls that an application has made, i.e. very few graphics API calls are blocking operations.

Additionally, PowerVR GPUs will try to process as much vertex and fragment work in parallel as possible to keep idle time to a minimum and make optimal use of the resources available. This means that to accurately measure performance you will need to render a number of frames and calculate the average frame-time to understand the true cost of your render.

PowerVR Graphics SDK v3.0

Because of this behaviour, it’s best to write micro-benchmarks to accurately measure performance. The aim of a micro-benchmark it is to understand the cost of rendering a static frame. Writing a benchmark for a dynamic scene (e.g. a fly-through mode in a game) is beyond the scope of this guide.

The following sections describe a simple, generic micro-benchmark. For the sake of simplicity, it assumes that the benchmark is written using the OpenGL ES 2.0 graphics API.

This benchmark guide makes use of glReadPixels() to force renders to complete. This is a very expensive operation as it removes all parallelism between the CPU and GPU so we recommend only using glReadPixels() when it’s absolutely necessary.

If you are not already familiar with the PowerVR GPU architecture, you should check out our PowerVR Series5 Architecture Guide for Developers document.

Platform setup

Before you begin benchmarking your application, you need to ensure that your target platform is setup appropriately.

Disabling V-Sync

V-Sync is a feature enabled on most platforms that synchronises the display’s refresh rate with GPU’s frame rate to avoid tearing (an artefact caused by the GPU updating a surface the display is still reading from). As V-Sync limits the number of frames that the GPU will process, it prevents you from accurately calculating the cost of your render. You have two options:

1. Disable V-Sync: If possible, you should disable V-Sync on your platform. This will remove the limit and will allow the GPU to render frames as fast as possible
2. Rendering off-screen: If you cannot disable V-Sync on you platform, you should repeatedly render to off-screen surfaces (e.g. OpenGL ES FBOs) to keep the GPU busy. Rendering off-screen is beyond the scope of this guide

Ensure no other application is using the PowerVR GPU

When benchmarking, you must ensure that the GPU is only processing work submitted by your application. If you’re unsure which processes are utilising the GPU, you can use PVRTune to profile the GPU and identify the processes that are submitting work to it.

If you cannot disable other processes that are using the GPU but they have a fixed cost (for example, the SurfaceFlinger compositor on some Android devices), you can factor this cost into your calculations and still run your benchmark on the device. Keep in mind that even with a fixed cost, your benchmark will be less accurate when other applications are using the GPUs resources.

You should not run your benchmark on the device if other processes using the GPU have a varying cost, as this will severely impact the accuracy of your tests.

What Should I Be Benchmarking?

Static scenes

A micro-benchmark should render a static scene so that the results are well defined. There should be no dynamic parts to the render. Additionally, the graphics API calls made in each frame should be consistent. Ideally, the benchmark should render identical frames over and over again to understand their average cost.

Asset Warm-Up

When writing a benchmark, the first thing to remember is that drivers don’t have to upload textures or compile shaders at the point that they were submitted to the graphics API. The graphics driver may, instead, defer this work until the first time that the resource is referenced by a draw call (this allows the driver to avoid uploading redundant resources that are never actually used in the render).

An asset warm-up phase allows you to force the driver to upload the resources that you will be using in your micro-benchmark.

How can I make the driver do that?

As the driver will upload and compile assets at the point that they are first used, the easiest way to force the operations is to do the following:

1. Render your static scene a number of times (~10 frames should do)
2. Call glReadPixels() before the final eglSwapBuffers(). This will force the driver to complete all renders that has been submitted so far. Reading back a 1×1 region is sufficient, as you don’t need the returned data
a. You only need to call glReadPixels() once here

Benchmarking the scene

Now that the driver has warmed up the required assets and we’re happy with the platform’s setup, it’s time to start benchmarking!

To get an accurate measure of the cost of your render, you should send a large number of frames to the GPU between your first timestamp and your last (the more frames, the better!). Processing a large number of frames allows the GPU to keep multiple frames in flight at a time (as it would to in a standard, well written application) and it also reduces the impact of any setup and shutdown costs caused by the benchmark. Here’s an overview of how this should be implemented:

1. Collect a time stamp after the asset warm-up frame has completed (eglSwapBuffers() has returned)
2. Render your static scene a large number of times (at least 10 seconds worth of rendering)
3. Call glReadPixels() before the last eglSwapBuffers() to force a render.
a. You only need to call glReadPixels() at the end of the benchmark. Do not call this every frame, as it will severely impact the benchmark performance.
4. Collect a time stamp after glReadPixels() returns
5. Calculate the average frame time (the elapsed time divides by the number of frames that were rendered)

…and you’re done!

My benchmark is more complex than this. What should I do?

You can use our PowerVR dedicated forum. We’re more than happy to help anyone who would like to accurately measure the performance of their graphics rendering on PowerVR hardware. If you would rather discuss your benchmark privately, you can email devtech@imgtec.com.

Fore more announcements and news from Imagination’s PowerVR ecosystem, follow us on Twitter (@PowerVRInsider and @ImaginationTech) and keep coming back to the PowerVR developers’ blog.

Joe Davis

Joe Davis

Joe Davis leads the PowerVR Graphics developer support team. He and his team support a wide variety of graphics developers including those writing games, middleware, UIs, navigation systems, operating systems and web browsers. Joe regularly attends and presents at developer conferences to help graphics developers get the most out of PowerVR GPUs. You can follow him on Twitter @joedavisdev.

Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

bseries imgic technology

Back in the high-performance game

My first encounter with the PowerVR GPU was helping the then VideoLogic launch boards for Matrox in Europe. Not long after I joined the company, working on the rebrand to Imagination Technologies and promoting both our own VideoLogic-branded boards and those of our partners using ST’s Kyro processors. There were tens of board partners but only for one brief moment did we have two partners in the desktop space: NEC and ST.

Read More »
b series hero banner 2

IMG B-Series – a multi-core revolution for a new world

B-Series uses multi-core to deliver an incredible 33 core variations for the widest range of options at all levels of performance points. From the smallest IoT cores up to the mid-range desktop equivalent B-Series an outperform mid-range next-gen consoles. Learn more in this blog post.

Read More »
pvrtune complete

What is PVRTune Complete?

PVR Tune Complete highlights exactly what the application is doing at the GPU level, helping to identify any bottlenecks in the compute stage, the renderer, and the tiler.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.