Convolutional neural networks in action

Share on linkedin
Share on twitter
Share on facebook
Share on reddit
Share on digg
Share on email

These days we seem to take it for granted how powerful and sophisticated computers have become. We can talk to our phones and our Bluetooth speakers and they will respond with context-aware information; in certain cars you can take your hands off the wheel and let yourself be carried down the road by electronics, and we can share messages and pictures with anyone anywhere in the world at the touch of a button.

But one area where our devices are still very much in their infancy is that of computer ‘vision’. While we have ever-better cameras in our pockets, in terms of understanding the world these devices are relatively dumb. While they can see with ever greater clarity, they can’t yet understand what they are seeing.

For example, if you show a three-year-old child an image of a person standing next to an elephant, they would have no issues telling you what they are seeing but for a computer to do the same would be extremely challenging.

Computers can only recognise objects once they’ve been trained using large image datasets

However, things are changing. In recent years a field of computing called ‘deep learning’ has greatly enhanced the ability of computers to understand as well as they can see. Rather than relying on traditional image processing techniques, deep learning, and specifically the use of convolutional neural networks, are beginning to make significant inroads into giving computers the ability to make sense of the world.

Convolutional neural networks were first pioneered back in the late 1980s based on based on a series of earlier work in the 1960s on Artificial Neural Networks (ANNs) and Multilayer Perceptrons (MLPs). They were originally designed to work in a similar way to the human brain. Of course much like a human brain, in order to do their job well, they need lots of data on which to be trained.

CNNs became more widely known and used around 2005 with the rise of modern GPUs, as their ability to process repetitive tasks at speed make it practical to use CNNs.

Work in the field on giving computers visual intelligence made a significant leap in 2012 when Alex Krizhevsky used a neural network to win the ImageNet challenge. This is a huge image database of millions of images that was created in 2007 by Professor Kai Li at Princeton University to provide computers with enough training data to help them learn in the same way a child would. The ImageNet challenge is commonly described as the annual Olympics of computer vision and tests how fast a computer can learn to understand what it’s seeing based on a large selection of images; the fewer errors, the greater the score.

The AlexNet CNN made a large impact in 2012 by rapidly increasing image recognition performance

At the time, Krizhevsky was able to reduce the error rate down from 26% to 15% – a major improvement and it was all made possible by the use of a convolution neural network. Each year this process is improved further as teams create ever better systems to speed up and improve the ability of devices to understand images.

The performance of CNNs on ImageNet has developed rapidly in recent years.

But how are CNNs being used in the real world today and what impact are they having?

Assistive technology

In a famous scene from 2001: A Space Odyssey, astronauts David Bowman and Frank Poole hide in a pod where HAL, the ship’s computer, cannot hear them discussing his odd behaviour. However, HAL is able to read their lips and works out that are going to deactivate him – with infamous results. Well, we now know that he would have used a CNN to decipher what they were saying. There are more down to earth uses for a lip reading computer, such as getting transcriptions from video content where the audio is not available, such as for journalists to obtain off-mic comments from politicians or celebrities.

HAL 9000
HAL 9000 from the 1968 movie 2001: A Space Odyssey showed a computer lipreading.

A group from the University of Oxford has proposed using a CNN exactly for this, while another paper submitted to the IEEE proposes how a CNN could be used to “reduce the negative influence caused by shaking of the subject and face alignment blurring at the feature-extraction level.” It produced a word recognition rate of up to 71.76%, far superior to conventional methods.

However, you can also see the power of CNNs actually running in your hand today. An app called AIPoly, designed to assist blind and partially sighted people leverages an Imagination PowerVR GPU in identifying objects using the smartphone camera and says out loud what they are.


CNN are closely associated with automotive but actively using them to power self-driving cars is still a work in progress. This paper from Cornell University discusses how they can be effectively used to recognise car license plates with CNNs to deliver better results than conventional approaches. Of course, license plate aren’t as unpredictable as moving objects such as pedestrians, but a paper discusses using CNNs to achieve this with improved efficiency over previous methods.

When it comes to those pesky moving objects known as people CNNs are also expected to play a key role as the foremost algorithm type used in ADAS and autonomous vision systems in cars. CNNs are extremely efficient at analysing a scene and breaking it up into its recognisable components until objects, people, cars, trucks, road kerbs and road signs are recognised through a camera-based system. By using vast amounts of training data, the Convolutional Network can ‘learn’ what to look for and extract this from a scene while driving in real time. As an example, through the various layers of the CNN one can detect corners/curves, followed by circles, then road signs and finally, what the road sign means. This is then passed onto a sensor fusion element to take inputs from other sensors i.e. LiDAR or radar to make sense of the bigger picture, and then act upon them either by flashing a warning via the Multi Media Interface or by taking control of braking and or steering.

convolutional neural networks

CNNs can be implemented on the CPU or using GPU compute, which is much more efficient – often by at least a factor of ten – or by hardware acceleration, which will ultimately yield the highest performance at the lowest power and silicon footprint.


By their very nature, CNNs are very good at detecting patterns, making them well suited to assisting in medical situations. As this article in discusses they can be very effective in increasing the accuracy of recognising cancers and have been used to pick up “primary breast cancer detection12, glioma grading13 and epithelium and stroma segmentation”. Their efficiency means they can reduce the workload for pathologists and the paper concludes that “’deep learning’ holds great promise to improve the efficacy of prostate cancer diagnosis and breast cancer staging.”

Cancer detection CNN

Equally, a paper from Cornell University on using CNNs to aid in breast cancer screening considers the issues that arise when down-sampling the image fidelity of training data and suggests that image resolution must be maintained to ensure the best performance.


If you’re afraid of the thought of computers building themselves, then you might have cause to worry. The semiconductor industry is one that is looking at using deep learning to aid in the design and manufacturing of advanced integrated circuits. CNNs are seen as being very well suited to solving certain manufacturing problems. In a similar vein to identifying cancers, the ability of CNNs to spot patterns will be put to good use in the lithography process, greatly reducing manufacturing defects and helping to increase yields.

CNNs are also being widely used to recognise foods. This paper discusses using a CNN for automatic diet recognition to enable specialists to discover unhealthy food patterns. There are several papers that describe using CNNs in this way; this one refers to it as ‘DeepFood’ for computer aided dietary assessment, improving health and longevity.

Social Media

Making digital images look great is a skill that many people spend a lot of time perfecting, through careful use of image retouching tools. An experimental process from Adobe and Cornell University called “Deep Photo Style Transfer” is looking to make those people redundant, by applying artificial intelligence. This app can take the style of one photo and automatically apply it to another, with dramatic results.

CNN image style transfer

CNNs are also widely used by sites such as Facebook. Here the company describes how they use one in DeepText, which they describe as “a deep-learning-based text understanding engine that can understand with near-human accuracy the textual content of several thousand posts per second, spanning more than 20 languages.”


Imagination is naturally looking closely at ways of accelerating the use of inference engines; that is the running of CNNs on devices once they have been fully trained on datasets. As we demonstrated last year our PowerVR Rogue GPUs already offer 3x greater efficiency and up to 12x faster performance than running on a CPU, and our new PowerVR Furian architecture will offer even greater performance and power efficiency.

One of our recent blog posts highlights our work in this area and how we are the first to make use of the CNN extension in OpenVX, the open source standard API for computer vision.

We are continuing to work in this sphere and Imagination’s Paul Brasnett recently spoke at the Embedded Vision Summit on the subject of ‘Training CNNs for Efficient Inference‘. In his presentation, he explained Imagination’s approach to improving the efficiency of running CNNs on hardware where power and area constraints are of primary concern, such as on mobile devices and in automotive.

It’s an exciting time for computer vision, and Imagination will be at the heart of it. We look forward to bringing you news regarding upcoming products that will make even greater strides in this area in the coming months.

Benny Har-Even

Benny Har-Even

With a background in technology journalism stretching back to the late 90s, Benny Har-Even has written for many of the top UK technology publications, across both consumer and B2B and has appeared as an expert on BBC World Business News and BBC Radio Five Live. He is now Content Manager at Imagination Technologies.

Please leave a comment below

Comment policy: We love comments and appreciate the time that readers spend to share ideas and give feedback. However, all comments are manually moderated and those deemed to be spam or solely promotional will be deleted. We respect your privacy and will not publish your personal details.

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom
Tel: +44 (0)1923 260 511

Search by Tag

Search by Author

Related blog articles

bseries imgic technology

Back in the high-performance game

My first encounter with the PowerVR GPU was helping the then VideoLogic launch boards for Matrox in Europe. Not long after I joined the company, working on the rebrand to Imagination Technologies and promoting both our own VideoLogic-branded boards and those of our partners using ST’s Kyro processors. There were tens of board partners but only for one brief moment did we have two partners in the desktop space: NEC and ST.

Read More »
b series hero banner 2

IMG B-Series – a multi-core revolution for a new world

B-Series uses multi-core to deliver an incredible 33 core variations for the widest range of options at all levels of performance points. From the smallest IoT cores up to the mid-range desktop equivalent B-Series an outperform mid-range next-gen consoles. Learn more in this blog post.

Read More »
self driving

The long and winding road to autonomous cars

The past has a habit of making promises that the future struggles to keep. By 2001, we were supposed to be encountering black monoliths around the orbit of Saturn, but, in reality, we’re only just getting close to returning to the moon after a 50-year absence. Where we were going, we were supposed not to need roads, but in reality, the only flying cars we have are the ones that took a wrong turn off a cliff because the driver was slavishly looking at the sat nav instead of looking out of the window.   The 1980s, of course, gave us KITT, the four-wheeled, driverless, talking Knight Rider car and with a rumoured film remake in the works, this is still something that fascinates us. Of course, while we can talk at our cars, they don’t yet talk back – but the driverless thing? Well, that is certainly on the cards.

Read More »


Sign up to receive the latest news and product updates from Imagination straight to your inbox.