Android is Google’s mobile and embedded operating system that targets every consumer electronics product from smartphones and tablets to handheld gaming consoles or smart TVs and wall ovens. Yes, you’ve read correctly – an oven capable of running OpenGL ES apps. But graphics APIs aside, the latest version of the Android operating system (Android 4.2 Jelly Bean) also includes two compute APIs that will open up new worlds for apps developers: Renderscript Compute and Filterscript.

Renderscript is not new in itself. All Android 4.0-based devices had been able to use it for graphics before Android 4.1 was introduced and the Renderscript graphics engine became deprecated. The compute part of the API however had certain limitations (for example, it could only be run on the CPU). With Android 4.2, things have moved in a good direction for GPU compute enthusiasts as not only has Google enabled Renderscript to execute on the GPU, but it has introduced a second API targeting special programming use-cases called Filterscript.

Filterscript’s relevance for mobile devices

Filterscript is essentially a carefully chosen subset of Renderscript APIs that allows developers to run code on a potentially wider variety of processors (CPUs, GPUs, and DSPs). For example, a script can include different parameters to tell the Renderscript runtime it does not require strict IEEE 754-2008 floating point semantics. Filterscript also has a separate file extension to Renderscript (.fs instead of .rs). The changes from Renderscript Compute were designed explicitly to optimise parallel processing cases predominantly run on the GPU, offering developers a new set of tools dedicated to pixel processing (similar to Khronos APIs).

The benefits of Filterscript over Renderscript are related to cross platform development brought to the CPU. It reduces instruction set lock-in and offers heterogeneous platforms a better opportunity to benefit from GPU acceleration.

This new compute API is particularly suited to image processing operations, particularly aiming to replace kernels that one would typically write with GLSL. Since we’ve had a long history of helping companies develop specific use cases for GPU compute on PowerVR Series5 and Series5XT cores, we’ve developed a Filterscript demo and ran a selection of the most popular Android image processing scripts on three computing platforms integrating a PowerVR SGX544 GPU and observed the results.

PowerVRGPU_PowerVR_SGX544_Filterscript_demo (5)

Image adjustment example running on a PowerVR SGX544SC-based platform with all filters turned on

PowerVRGPU PowerVR SGX544 Filterscript demo (4)

Image adjustment example running on a PowerVR SGX544SC-based platform with one filter turned on

PowerVR is the best mobile GPU compute architecture

Before delving through our findings, there are a few things to be mentioned. First and foremost, Imagination’s PowerVR architecture has been designed for efficiency from day one and continues to provide the most power efficient family of GPUs available on the market today. Key advantages such as the TBDR (Tile-Based Deferred Rendering) approach to rendering graphics, the PVRTC technology for texture compression, and a unified architecture focused both on fillrate and GPU compute efficiency provide our partners with the tools they require to succeed in a dynamic market.

PowerVRGPU PowerVR SGX544 Filterscript demo (CPU only) PowerVRGPU PowerVR SGX544 Filterscript demo (GPU)

Image adjustment example running on the dual-core CPU (left) and the PowerVR SGX544SC GPU (right)

Our dedicated engineering team has put a lot of effort into providing the mobile and embedded market with the best solution that is optimized for low power but also is able to deliver unmatched performance points. Both partners and industry analysts have confirmed the low power characteristics of PowerVR SGX and ‘Rogue’ cores.

Another important point to be made is that for mobile power becomes critically important. While clock frequency is no longer a problem and silicon area is less of a concern, all designs however become power limited so efficiency determines your performance. This is where our carefully balanced design and robust architecture come into play, offering a scalable roadmap that can be integrated with a wide range of CPU and bus interconnect architectures and supports all major compute APIs such as OpenCL , Renderscript Compute, Filterscript and, for future PowerVR generations, standards promoted by groups like the HSA Foundation.

Filterscript examples and final words

Developers have found Filterscript very efficient at handling simple scripts that may otherwise have been written in GLSL. Many apps also have large amounts of C or C++ pre-processing code that runs before final operations in Filterscript. Therefore, we’ve ported existing OpenCL applications as well as developed new Filterscript code to test whether PowerVR GPUs are able to handle GPU compute code and so far the results have been very promising.

But before we give you the results, here is a comparison of the peak GFLOPS performance of each platform, when looking at the total processing power of the respective CPUs and GPUs.

Multicore CPUs and PowerVR Series5XT GPUs GFLOPS

The performance comparison charts below show how some well-known image processing filters implemented in Filterscript and Renderscript have run on a platform with a PowerVR Series5XT GPU.

PowerVRGPU Android Renderscript Filterscript PowerVR SGX544MP3

PowerVRGPU Android Renderscript Filterscript PowerVR SGX544MP2

PowerVRGPU Android Renderscript Filterscript PowerVR SGX544SC

Notice that even though some scripts have roughly the same performance on both multicore CPUs and GPUs, it is important to remember each processor’s running frequency and peak power consumption. The important takeaway for Filterscript and Renderscript running on the PowerVR GPU here is that you get similar or better performance (going up to three- to sevenfold in some cases) at a much lower frequency, and therefore get much lower system power consumption. Furthermore, by offloading parts of your application to the PowerVR GPU, the CPU is free to handle other tasks and the overall system efficiency is increased as well.

We’ve seen GPU compute examples where by moving the code on the GPU, we’ve achieved massive savings of up to 1.5W at the system level so taking the time to optimize your code to run on the appropriate processor for a corresponding task can definitely impact the overall user experience in a meaningful way.

Interested in Imagination’s PowerVR GPUs and Android’s APIs for GPU Compute? Then follow us on Twitter (@GPUCompute, @PowerVRInsider and @ImaginationPR) and keep coming back to our blog. We even have a dedicated tag where you can find all you need to know about GPU Compute, HPC, heterogeneous processing and other similar topics.