Deep learning on mobile devices at the Embedded Vision Summit 2016

It was clear last week at the annual Embedded Vision Summit in Santa Clara that the time of computer vision and deep learning on mobile had finally arrived. Interest in the area is growing noticeably – the Summit program expanded from one to two days this year, there were an impressive number of attendees, and the Technology Showcase was busy throughout the show.

The overall industry sentiment seems to be that while vision is still quite a difficult challenge, it is becoming a more and more a solvable problem. There is active development for vision processing applications in multiple different market segments including automotive, surveillance, photography and consumer electronics.

One of the interesting points made during the show was in a presentation by Cadence Design Systems that showed the incredible growth in the number of sensors – whether they be cameras, gyroscopes, thermometers, proximity sensors etc. – in today’s devices. When you look at the amount of data generated by these sensors, suddenly the chart is dominated entirely by the massive amount of pixels generated by cameras/vision sensors.

Embedded Vision Summit - Massive amount of computer vision data - via CadenceCadence highlighted the massive amount of data generated by images

Visual information is the highest bandwidth, most information-rich way in which consumers interact with the world. The challenge is in analyzing and using that visual data – since it has the potential to absorb almost unlimited computing resources and huge amounts of bandwidth. The processing of this data will increasingly require optimised solutions from sensor to output, to minimise the power and bandwidth, whilst giving the required performance – and I’m very excited to see our latest PowerVR multimedia IP developments developments specifically targeting these applications.

Jeff Dean from the Google Brain team pointed out that with neural networks, results get better with more data, bigger models, and more computation. He described the deep learning work being done by the Google Brain project since 2011, with unique project directories now numbering ~1,200, and being used for Android apps, drug discovery, Gmail, image understanding, maps, natural language, understanding, photos, robotics research, speech, translation, YouTube and more. With all of the work going on, results are definitely getting better. See the chart below – the latest GoogLeNet Inception architecture now scores better on image recognition than humans!

Embedded Vision Summit - Rapid progress in image recognitionGoogle presented the rapid progress in image recognition in neural nets

From a technology perspective, one of the biggest focuses at this year’s show was around convolutional neural networks (CNNs). For those of you not submerged in the world of vision processing, CNNs are a form of machine learning that model the manner in which the brain’s visual cortex identifies and distinguishes objects.

Using Deep Learning algorithms, CNNs can be used for computational photography, computer vision, AR/VR, etc. Google are one of the leaders in this field, and their open-sourcing of the Tensor flow engine aims to further the development of models, software and hardware by multiple players in the industry. Beyond the Google tools, others are also available to take advantage of these rapid developments in deep learning, with other frameworks such as Caffe also freely available.

At the Summit, my colleague Paul Brasnett presented Efficient Convolutional Neural Network Inference on Mobile GPUs. Paul talked about how it’s possible today to run state of the art CNN algorithms on the millions of PowerVR mobile GPUs that are already available in a range of SoCs across numerous markets. For things like computational photography, augmented reality, object recognition and more, the performance is quite remarkable. Compared to mobile CPUs, PowerVR GPUs offer up to 3x higher efficiency and up to 12x higher performance deployment for CNNs. Newer CNN architectures with smaller fully connected layers help to make more efficient use of compute resources. During the Technology Showcase, we demonstrated this concept – see below.

Embedded Vision Summit - PowerVR GPUs running CNN demoImagination demonstrating CNNs running on a PowerVR Rogue GPU

Our demo used a Google Nexus Player with a Intel Atom quad-core SoC, containing a PowerVR G6430 GPU. The application runs on the Caffe framework alongside Alexnet accelerated using OpenCL. The demo uses the live input from the camera to identify the objects the camera is pointing at, along with an indication of the confidence level produced by the network. The performance is several times higher than a CPU running the same network.

Embedded vision is exploding across a wide range of applications, from computational photography and gaming to AR/VR and robotics, to smart cars and drones, and far, far beyond. With the huge leaps being made in this area, it will be amazing to see where we are as an industry at next year’s Summit.

At Imagination, we’re continuing to drive these technologies across our IP portfolio: visit our PowerVR Graphics and PowerVR Vision websites for more information. Make sure you also follow us on Twitter (@ImaginationTech, @GPUCompute and @PowerVRInsider) for more news and announcements from Imagination.

Do you think that CNN’s will be the dominant method used for vision processing in your application area?

Leave a Comment

Search by Tag

Search for posts by tag.

Search by Author

Search for posts by one of our authors.

Featured posts
Popular posts

Blog Contact

If you have any enquiries regarding any of our blog posts, please contact:

United Kingdom

benny.har-even@imgtec.com
Tel: +44 (0)1923 260 511

Related blog articles

The ultimate embedded GPUs for the latest applications

Introducing PowerVR Series9XEP, Series9XMP, and Series9XTP As Benjamin Franklin once said, only three things in life are certain: death, taxes and the ongoing rapid advancement of GPUs for embedded applications*. Proving his point, this week, Imagination has once again pushed

Opinion: the balance between edge and cloud.

Simon Forrest explains how embedded chips can meet the challenge of delivering true local AI processing. GPUs and NNAs are rapidly becoming essential elements for AI on the edge. As companies begin to harness the potential of using neural networks

DJI Mavic 2 closed

Partner interview series: DJI – flying high

DJI is a name now synonymous with drones, thanks to an estimated 74% market share across consumer, professional and enterprise markets. In the second of our ‘quick chat’ interview series, we speak to we talk to Charlie Sun, R&D Director at DJI to find

PowerVR Series2NX collects another award at China tech summit

Following success at other award ceremonies this year, PowerVR’s Series2NX neural network accelerator has picked up another gong. This time, the panel of judges at the ASPENCORE Global Electronic Achievement Awards Annual EDA/IP award ceremony deemed it to be the most

Stay up-to-date with Imagination

Sign up to receive the latest news and product updates from Imagination straight to your inbox.

  • This field is for validation purposes and should be left unchanged.