Deep dive: OpenCL face detection on PowerVR [part 3]

Imagination’s R&D group has developed a face detection algorithm, which is based on a classifier cascade and is optimized to run on mobile devices comprising a CPU and PowerVR GPU. The algorithm employs several optimizations to improve performance and accuracy. In particular, instead of searching each entire frame for faces, the detector limits its search to regions in which faces were previously detected plus a few randomly selected regions. Tracking previously-found faces ensures they are not lost, while testing a variety of other regions ensures that new faces are found quickly.

The main steps performed are illustrated in the figure below:

12-Block-level implementation of face detection on CPU and GPU Block-level implementation of face detection on CPU and GPU

Source image preprocessing

The preprocessing kernel constructs three temporary images from a single source image including:

  1. A mipmap containing multiple versions of the source image at different scales.
  2. A copy of the image in chromatic colour space.
  3. A single-channel image (or probability map) that for each pixel records the probability that the corresponding pixel in the source image has skin colour, calculated by comparing the colour in the chromatic image to the colour of faces detected in previous frames.

The chromatic image and probability map are stored at quarter-resolution, which is sufficient to preserve accuracy while minimising memory and bandwidth requirements.
The pre-processing kernel operates on pixels of the source image in parallel: each work-item processes a separate block of 4×4 pixels, outputting one pixel of the chromatic image and one pixel of the probability map.

Tile generation

To facilitate parallel processing, the source image is divided into multiple tiles that can be processed independently on separate GPU clusters. These regions are described using an integral image that simplifies computation of Haar-like features.

Cascade classification

The cascade classifier limits its search to the vicinity of any faces detected in the previous frame (and surrounding areas), skin-coloured areas identified by thresholding the probability map, and regions selected by the random candidate generator.

In comparison to the sequential sliding window approach required by a CPU, the GPU work-items can evaluate multiple windows in parallel. A property of the algorithm is that some evaluations complete much sooner than others, each window requiring anywhere from one to one hundred stages of computation. To maintain parallelism, when a work-item finishes evaluating one window it starts evaluating another.

Find regions with skin colour

The skin region detector finds areas of the probability map that have high probability, passing these coordinates to the cascade classifier.

Zero-copy implementation

The CPU code is implemented in C++ and the GPU kernels are implemented in OpenCL. As shown in the diagram below, an Android demonstration application is created using the PowerVR imaging framework (introduced in a previous article in this series). This framework enables the face detection algorithm to be efficiently pipelined across the ISP, GPU and CPU, making use of shared zero-copy memory and cache allocations that minimize synchronization overheads.

13-Creating-an-Android-app-using-the-PowerVR-imaging-framework_fCreating an Android app using the PowerVR imaging framework

When integrated into an application based on the PowerVR Imaging Framework SDK, Imagination’s optimized face detection algorithm can detect up to four faces processed in real-time at 1080p 30fps using a two-cluster GPU part clocked at 200MHz. This leaves plenty of headroom to combine other tasks into the software pipeline such as image stabilization beforehand and beautification afterwards, while still achieving 1080p30 performance on many existing mobile and tablet products available in the market today.

Concluding remarks

Imagination’s hardware portfolio enables silicon vendors to create devices that deliver best-in-class performance while operating under a tight power and thermal envelope. Its PowerVR GPUs provide the performance and flexibility needed to accelerate both graphics and data-parallel computations across many mobile and embedded devices in the market today.

By pairing Imagination hardware with the PowerVR Imaging Framework, designers can now harness the vast amounts of performance available in their target SoC including the GPU, ISP, CPU, video codecs and hardware accelerators. Imagination’s close collaboration with strategic OEMs–and in some cases their third-party software partners–has already helped deliver new computational photography and computer vision use cases to market that intelligently distribute the required computations across the available heterogeneous hardware components.

Further reading

Here is a menu to help you navigate through every article published in this heterogeneous compute series:


Please let us know if you have any feedback on the materials published on the blog and leave a comment on what you’d like to see next. Make sure you also follow us on Twitter (@ImaginationTech, @GPUCompute and @PowerVRInsider) for more news and announcements from Imagination.

Leave a Comment

Search by Tag

Search for posts by tag.

Search by Author

Search for posts by one of our authors.

Featured posts
Popular posts

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom
Tel: +44 (0)1923 260 511

Related blog articles

Image-based lighting

PowerVR Tools and SDK 2018 Release 2 now available

Here’s an early Christmas present for graphics developers – the release of the latest version of our PowerVR Tools and SDK! The headline features for this release include some exciting new examples demonstrating new techniques in our SDK, and some very

on stage in China

PVRIC4 a hit at ICCAD 2018 in China

Imagination’s PVRIC4 image compression tech garnered plenty of attention at the recent ICCAD China 2018 symposium, which took place on 29th and 30th November at the Zhuhai International Convention & Exhibition Centre, China. The annual event focusses on integrated circuit

The ultimate embedded GPUs for the latest applications

Introducing PowerVR Series9XEP, Series9XMP, and Series9XTP As Benjamin Franklin once said, only three things in life are certain: death, taxes and the ongoing rapid advancement of GPUs for embedded applications*. Proving his point, this week, Imagination has once again pushed

Opinion: the balance between edge and cloud.

Simon Forrest explains how embedded chips can meet the challenge of delivering true local AI processing. GPUs and NNAs are rapidly becoming essential elements for AI on the edge. As companies begin to harness the potential of using neural networks

Stay up-to-date with Imagination

Sign up to receive the latest news and product updates from Imagination straight to your inbox.

  • This field is for validation purposes and should be left unchanged.