Micro-benchmark your render on PowerVR Series5, Series5XT and Series6 GPUs

Benchmarking the performance of applications running on PowerVR GPUs isn’t as simple as collecting time stamps as various points in your render. The reason for this is that the graphics driver submits work to the GPU independently from the API calls that an application has made, i.e. very few graphics API calls are blocking operations.

Additionally, PowerVR GPUs will try to process as much vertex and fragment work in parallel as possible to keep idle time to a minimum and make optimal use of the resources available. This means that to accurately measure performance you will need to render a number of frames and calculate the average frame-time to understand the true cost of your render.

PowerVR Graphics SDK v3.0

Because of this behaviour, it’s best to write micro-benchmarks to accurately measure performance. The aim of a micro-benchmark it is to understand the cost of rendering a static frame. Writing a benchmark for a dynamic scene (e.g. a fly-through mode in a game) is beyond the scope of this guide.

The following sections describe a simple, generic micro-benchmark. For the sake of simplicity, it assumes that the benchmark is written using the OpenGL ES 2.0 graphics API.

This benchmark guide makes use of glReadPixels() to force renders to complete. This is a very expensive operation as it removes all parallelism between the CPU and GPU so we recommend only using glReadPixels() when it’s absolutely necessary.

If you are not already familiar with the PowerVR GPU architecture, you should check out our PowerVR Series5 Architecture Guide for Developers document.

Platform setup

Before you begin benchmarking your application, you need to ensure that your target platform is setup appropriately.

Disabling V-Sync

V-Sync is a feature enabled on most platforms that synchronises the display’s refresh rate with GPU’s frame rate to avoid tearing (an artefact caused by the GPU updating a surface the display is still reading from). As V-Sync limits the number of frames that the GPU will process, it prevents you from accurately calculating the cost of your render. You have two options:

1. Disable V-Sync: If possible, you should disable V-Sync on your platform. This will remove the limit and will allow the GPU to render frames as fast as possible
2. Rendering off-screen: If you cannot disable V-Sync on you platform, you should repeatedly render to off-screen surfaces (e.g. OpenGL ES FBOs) to keep the GPU busy. Rendering off-screen is beyond the scope of this guide

Ensure no other application is using the PowerVR GPU

When benchmarking, you must ensure that the GPU is only processing work submitted by your application. If you’re unsure which processes are utilising the GPU, you can use PVRTune to profile the GPU and identify the processes that are submitting work to it.

If you cannot disable other processes that are using the GPU but they have a fixed cost (for example, the SurfaceFlinger compositor on some Android devices), you can factor this cost into your calculations and still run your benchmark on the device. Keep in mind that even with a fixed cost, your benchmark will be less accurate when other applications are using the GPUs resources.

You should not run your benchmark on the device if other processes using the GPU have a varying cost, as this will severely impact the accuracy of your tests.

What Should I Be Benchmarking?

Static scenes

A micro-benchmark should render a static scene so that the results are well defined. There should be no dynamic parts to the render. Additionally, the graphics API calls made in each frame should be consistent. Ideally, the benchmark should render identical frames over and over again to understand their average cost.

Asset Warm-Up

When writing a benchmark, the first thing to remember is that drivers don’t have to upload textures or compile shaders at the point that they were submitted to the graphics API. The graphics driver may, instead, defer this work until the first time that the resource is referenced by a draw call (this allows the driver to avoid uploading redundant resources that are never actually used in the render).

An asset warm-up phase allows you to force the driver to upload the resources that you will be using in your micro-benchmark.

How can I make the driver do that?

As the driver will upload and compile assets at the point that they are first used, the easiest way to force the operations is to do the following:

1. Render your static scene a number of times (~10 frames should do)
2. Call glReadPixels() before the final eglSwapBuffers(). This will force the driver to complete all renders that has been submitted so far. Reading back a 1×1 region is sufficient, as you don’t need the returned data
a. You only need to call glReadPixels() once here

Benchmarking the scene

Now that the driver has warmed up the required assets and we’re happy with the platform’s setup, it’s time to start benchmarking!

To get an accurate measure of the cost of your render, you should send a large number of frames to the GPU between your first timestamp and your last (the more frames, the better!). Processing a large number of frames allows the GPU to keep multiple frames in flight at a time (as it would to in a standard, well written application) and it also reduces the impact of any setup and shutdown costs caused by the benchmark. Here’s an overview of how this should be implemented:

1. Collect a time stamp after the asset warm-up frame has completed (eglSwapBuffers() has returned)
2. Render your static scene a large number of times (at least 10 seconds worth of rendering)
3. Call glReadPixels() before the last eglSwapBuffers() to force a render.
a. You only need to call glReadPixels() at the end of the benchmark. Do not call this every frame, as it will severely impact the benchmark performance.
4. Collect a time stamp after glReadPixels() returns
5. Calculate the average frame time (the elapsed time divides by the number of frames that were rendered)

…and you’re done!

My benchmark is more complex than this. What should I do?

You can use our PowerVR dedicated forum. We’re more than happy to help anyone who would like to accurately measure the performance of their graphics rendering on PowerVR hardware. If you would rather discuss your benchmark privately, you can email devtech@imgtec.com.

Fore more announcements and news from Imagination’s PowerVR ecosystem, follow us on Twitter (@PowerVRInsider and @ImaginationTech) and keep coming back to the PowerVR developers’ blog.

Search by Tag

Search for posts by tag.

Search by Author

Search for posts by one of our authors.

Featured posts
Popular posts

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

Tel: +44 (0)1923 260 511

Related blog articles

British Engineering Excellence Award

PowerVR Vision & AI design team collect another award

We’re delighted that the design team for our PowerVR Series2NX Neural Network Accelerator (NNA) has been honoured with a prestigious British Engineering Excellence Award (BEEA). The BEEAs were established in 2009 to demonstrate the high calibre of engineering design and innovation in the

Series8XT AR/VR Banner

Imagination Technologies: the ray tracing pioneers

After a period out of the spotlight, ray tracing technology has recently come back into focus, taking up a lot of column inches in the tech press. The primary reason is because graphics cards for the PC gaming market have

Amazon Fire Stick 4K pic

Amazon Lights up its Fire TV Stick 4K with PowerVR

Amazon, the internet shopping giant, announced earlier this week the latest version of its media streaming device, the Fire TV Stick 4K. First released in 2016, the Fire TV stick brings catch-up streaming services to any TV with an HDMI

Stay up-to-date with Imagination

Sign up to receive the latest news and product updates from Imagination straight to your inbox.

  • This field is for validation purposes and should be left unchanged.
Contact Us

Contact Us