Increasing performance and power efficiency in heterogeneous software

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

Heterogeneous architectures in embedded computing are fast becoming a reality – we indeed see many leading IP and semiconductor companies today building heterogeneous computing hardware.

In the article below, I’m going to describe one typical use case for heterogeneous computing and the challenges that result from moving to a heterogeneous programming model.

Running a beautification algorithm on a modern SoC

The diagram below illustrates how a video recording application that performs beautification might be implemented using a number of heterogeneous hardware and software components. In this example, input frames captured by the ISP/camera are first inspected by the GPU to determine the position of a face and its individual features (i.e. eyes, lips, nose and possibly others), passing these coordinates along to the CPU which tracks and automatically adjusts the camera focus and exposure to maintain high quality video. The CPU also determines which parts of the face contain skin colour, and the GPU applies a bilinear filter which smooths these textures, removing artefacts that represent blemishes and wrinkles, while preserving sharpness around the edges of the face.

A vision software pipeline implemented on top of heterogeneous architectures

The sequence of transformed images is output to both the hardware encoder for recording to disk and to the display subsystem for rendering in a preview window. As an additional optimization, the CPU could instruct the hardware encoder to encode the face coordinates at higher-fidelity than the background, optimizing both video quality and file size. In this scenario, at least five different hardware components require access to the image data in memory.

Memory bandwidth constraints

A key characteristic of many SoCs is the presence of a single unified system memory such as an off-chip DDR DRAM, which is shared between all hardware components. These components typically communicate with other components and with memory using a shared bus or interconnect, the bandwidth of which is tightly constrained to limit implementation area and cost. SoC bandwidth is frequently an order of ten times less than is common on desktop-class machines with PCI Express buses, and is a common performance bottleneck–particularly in cases where multiple hardware components attempt to access memory and other I/O at the same time.

GPU memory bandwidth

Furthermore, when an application passes ownership of data between different hardware components, the underlying operating system may create a duplicate copy of the data in memory. In some cases this may be due to hardware limitations, for example where the GPU requires access to data allocated by the CPU in virtual memory that CPU can page to disk at will. In other cases this may be related to image formats; for example, the ISP produces image sensor data in YUV format but the CPU or GPU needs to filter this data in RGB colour space.

Conversely, some operating system such as Android automatically convert images from YUV to RGB format before presenting the data to developers (for example, as OGLES_TEXTURE_2D textures); this can reduce the efficiency of many vision algorithms that only need to process image luminance data. The inefficiencies introduced by these behind-the-scene copies can be quickly compounded when processing high-resolution image data at video rate.

Join us next time to see how our engineering team has addressed the SoC bandwidth issues above by creating an innovative PowerVR Imaging Framework.

Further reading

Here is a menu to help you navigate through every article published in this heterogeneous compute series:

 

Please let us know if you have any feedback on the materials published on the blog and leave a comment on what you’d like to see next. Make sure you also follow us on Twitter (@ImaginationTech, @GPUCompute and @PowerVRInsider) for more news and announcements from Imagination.

Alex Kelley

Alex Kelley

Alex Kelley has over 20-years of sales, marketing, and general management experience in 3D computer graphics and has worked in the USA and several countries in Asia. Since joining Imagination Technologies Alex has launched the Visualizer brand, which has most recently brought a photorealistic virtual camera to SketchUp transforming the way people view 3D models. Alex was a Vice President at Caustic Graphics, a start up acquired by Imagination, and before that held Vice President roles at Autodesk and Alias. Alex is fluent in Japanese, and holds a B.S. and M.S. degree in Computer Science from Arizona State University.

Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

benny.har-even@imgtec.com
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

Improving Particle Systems on PowerVR

This article is part of our ongoing series about the PowerVR Performance Recommendations. Today we’ll be focusing on what particle systems are and how they can affect performance, so if you’re interested, why not take a look?

Read More »

Connect

Sign up to receive the latest news and product updates from Imagination straight to your inbox.

  • This field is for validation purposes and should be left unchanged.

Subscribe to our newsletter

  • This field is for validation purposes and should be left unchanged.