For the last decade, Imagination has been at the forefront of heterogeneous compute, becoming a founding member of the HSA Foundation and a contributor to many open heterogeneous computing standards available today, including OpenCL, OpenGL ES and Vulkan.

Our MIPS processors, PowerVR multimedia and Ensigma connectivity technologies have been integrated in many mobile and embedded computing platforms; each silicon IP family has been optimized to be a class leader in terms of performance while saving power and area.

GPU compute_memory hierarchy in OpenCL - Heterogeneous compute

In a series of upcoming articles on our blog, my colleagues from the GPU compute group will look at how SoC designers and software developers can take advantages of the synergies that exist today in silicon and implement heterogeneous algorithms that deploy across multiple engines in a chip. To help you navigate through the jargon of heterogeneous compute, I thought it would be useful to provide a short guide to the technical vocabulary that we are going to use.

Most of the terminology mentioned in the table below refers to our PowerVR Rogue GPU, OpenCL or GPU compute concepts in general:

Arithmetic IntensityThe ratio of the number of arithmetic operations to memory operations performed.
BarrierIn OpenCL, a function used to synchronize work-items in a workgroup.
Coarse Grain SchedulerA Rogue hardware block that distributes work-items to the available multiprocessors. (The work-items are first grouped into warps.)
Common StoreA Rogue hardware block comprising a register bank, shared between all resident work-items. All registers are visible to all work-items residing on the multiprocessor.
KernelIn OpenCL, the source code that is executed by each work-item in an NDRange.
Memory FenceIn OpenCL, a location in the code where all pending loads and stores are guaranteed to have completed prior to any subsequent loads and stores having been commenced.
MultiprocessorA Rogue hardware block that manages the concurrent execution of multiple warps.
NDRangeIn OpenCL, an N-dimensional virtual grid of workgroups, where N can equal 1, 2 or 3. All work-items in the NDRange are executed concurrently.
Texture Processing UnitA Rogue hardware block that speeds up accesses to OpenCL images.
Unified StoreA Rogue hardware block comprising a register bank, shared between all resident work-items.
WarpA grouping of up to 32 work-items.
Work-itemIn OpenCL, one instance of an enqueued kernel.
WorkgroupIn OpenCL, a grouping of work-items that can synchronize and share data between one another.

If you find other terms that are not explained or sound unfamiliar in the articles below, please leave us a comment below and I will add it to the list.

Further reading

Here is a menu to help you navigate through every article published in this heterogeneous compute series:


Please let us know if you have any feedback on the materials published on the blog and leave a comment on what you’d like to see next. Make sure you also follow us on Twitter (@ImaginationPR, @GPUCompute and @PowerVRInsider) for more news and announcements from Imagination.


  • Ofer Rosenberg

    Regarding Memory Fence – Minor clarification: I believe that the right definition needs to be “All loads and stores of the same work-item are guaranteed to have completed”

  • Eulus Garza

    any working link for the rest of the articles ?