A primer on mobile systems used for heterogeneous computing

In the mobile and embedded market, the design constraints of electronic products can sometimes be seen as tight and contradictory: the market demands higher performance yet lower power consumption, reductions in cost but shorter time-to-market.

These constraints have created a trend for more specialized hardware designs that fit a particular application; if each task is well matched to a functional unit, fewer transistors are wasted and power efficiency is better. As a result, application processors have become increasingly heterogeneous over time, integrating multiple components into a single System-on-Chip (SoC).

The diagram below presents the architecture of a modern SoC. Such a chip typically includes a CPU (with optional multi-core and SIMD capabilities), a GPU for both 3D graphics acceleration and high-performance vector computation, an ISP (Image Signal Processor) for acquiring image sensor data, a VDE (Video Decoder and Encoder) for codec acceleration and an RPU (Radio Processing Unit) for connectivity. Each of these components has its own advantages and combinations of these can be used to implement many applications efficiently.

Heterogeneous computing SoC architecture

Today many application developers rely on the CPU to meet the requisite performance requirements of their advanced computational photography and computer vision algorithms. However, these CPU-centric solutions frequently struggle to deliver sustained video-rate processing of high-resolution images, largely due to thermal limits of the devices.

As shown in more detail in the figure below, a CPU combines a small number of cores with a large data cache, optimized for efficient execution of general-purpose control code with low memory latency. The GPU, on the other hand, dedicates its transistors to ALUs (arithmetic logic units) rather than data caches and control flow. This arrangement of hardware enables efficient execution of large unbranched data sets that require many repetitive arithmetic calculations, such as an image processing algorithm operating on many pixels.

Heterogeneous computing: CPU vs GPU architecture

Furthermore, because the GPU is designed to run at lower clock speeds than a CPU, offloading image processing workloads from the CPU to GPU can lead to both an increase in performance and a reduction in power consumption and generated heat. The resulting implementation is also likely to be more balanced and also more responsive, as the CPU has more free cycles to respond to the demands of the operating system and user interface.

In the context of mobile and embedded software, heterogeneous computing is the process of combining different types of processing units together to meet an application’s performance requirements within a limited power and thermal budget. By partitioning the application into multiple workloads that can be distributed across the available hardware units, so that each workload is run on the hardware unit capable of executing it most efficiently, the overall performance and power-efficiency of the implementation can be improved.

When partitioning an application, serial tasks should usually be allocated to the CPU, whereas data-parallel tasks are good candidates for offloading to the GPU. If the SoC provides dedicated hardware accelerators such as an ISP or VDE, related tasks such as image de-noising and video playback should usually be allocated to these accelerators in order to maximise power-efficiency.

However, in some cases it may be desirable to implement these tasks in software instead, for example using GPU compute, to trade efficiency for a higher-quality algorithm than may be provided by the fixed-function accelerator. The use of GPU compute is particularly common in the field of computer vision where active research is continually leading to refinements of existing algorithms as well as entirely new vision algorithms. Fast deployment of these algorithms into products requires both programmability and a high-performance compute capability.

Join us next time for an example use case of heterogeneous computing and the existing bandwidth constraints SoCs currently face.

Further reading

Here is a menu to help you navigate through every article published in this heterogeneous compute series:

Please let us know if you have any feedback on the materials published on the blog and leave a comment on what you’d like to see next. Make sure you also follow us on Twitter (@ImaginationTech, @GPUCompute and @PowerVRInsider) for more news and announcements from Imagination.


1 thought on “A primer on mobile systems used for heterogeneous computing”

  1. Balancing the design of a system with enough specialized hardware for the range of common workloads it’s projected to have to do in its lifetime would seem like a common sense priority for the designer of the system, yet such a basic principle has eluded the lead developers of even major projects on occasion in the past.

Leave a Comment

Search by Tag

Search for posts by tag.

Search by Author

Search for posts by one of our authors.

Featured posts
Popular posts

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

Tel: +44 (0)1923 260 511

Related blog articles

The ultimate embedded GPUs for the latest applications

Introducing PowerVR Series9XEP, Series9XMP, and Series9XTP As Benjamin Franklin once said, only three things in life are certain: death, taxes and the ongoing rapid advancement of GPUs for embedded applications*. Proving his point, this week, Imagination has once again pushed

Opinion: the balance between edge and cloud.

Simon Forrest explains how embedded chips can meet the challenge of delivering true local AI processing. GPUs and NNAs are rapidly becoming essential elements for AI on the edge. As companies begin to harness the potential of using neural networks

DJI Mavic 2 closed

Partner interview series: DJI – flying high

DJI is a name now synonymous with drones, thanks to an estimated 74% market share across consumer, professional and enterprise markets. In the second of our ‘quick chat’ interview series, we speak to we talk to Charlie Sun, R&D Director at DJI to find

PowerVR Series2NX collects another award at China tech summit

Following success at other award ceremonies this year, PowerVR’s Series2NX neural network accelerator has picked up another gong. This time, the panel of judges at the ASPENCORE Global Electronic Achievement Awards Annual EDA/IP award ceremony deemed it to be the most

Stay up-to-date with Imagination

Sign up to receive the latest news and product updates from Imagination straight to your inbox.

  • This field is for validation purposes and should be left unchanged.