In December last year, Imagination announced we were the first to submit an OpenVX 1.1 conformant implementation. In this blog post, we will show how our work has developed since then on one of the first implementations of the Khronos OpenVX 1.1 API as well as the new and very first implementation of the Convolutional Neural Network (CNN) extension that goes along with it.
First, a bit of background. OpenVX is an API developed by the Khronos group of which Imagination Technologies is an active promoter member. The new API provides developers with a standard method for writing applications that use vision operations on an image in an efficient way. Examples of these operations are “edge detection” or “thresholding”.
The OpenVX API enables users to chain together these operations for purposes such as detecting corners in an image or adjusting an image’s perspective. This makes developing applications quicker and easier compared to the alternative of implementing the application in another API such as OpenCL or Compute which are not targeted at vision operations. Our implementation runs on OpenCL-capable PowerVR GPUs.
Read this blog post to find out more information on how Imagination created the first conformant implementation of OpenVX 1.1.
One advantage of this API for developers is that an application written for one OpenVX implementation should work on any other implementation. For example, there could be an implementation that runs on the CPU on a system without a GPU. The application used for both platforms should not need to be modified to run because of the use of a standard API.
Another first is our implementation of the CNN extensions for OpenVX.
What is a CNN? CNN stands for ‘Convolutional Neural Network’ and is a part of machine learning. It is used in many different areas and one such area is image recognition. Imagination has implemented this extension for OpenVX and we have created a demo to show off this feature running on a PowerVR GPU.
First, we are using standard vision operations in OpenVX to get the bounding box of an image that the user has created. Then we use a simple LeNet CNN graph, which lets us estimate what the user input image represents. In this demo, we are recognising numerical digits and have trained the demo against the MNIST set of handwritten digits.
Of course, we could extend this to recognise a variety of things. For example, we could use it to recognise a voice and turn that into an application that does voice recognition in real-time. Equally, compared to running on the CPU this means better efficiency and longer battery life on mobile. Compared to an application doing the same thing in OpenCL, this application should be portable between implementations. The use of OpenVX means we were able to implement this demonstration in only a relatively few lines of code and with only tens of man hours of effort, compared with the hundreds or thousands it would take to develop from scratch.
Our graph is composed of a very simple set of operations so far, but this can be extended to more complicated graphs and to deep learning algorithms. This demo is currently running on general purpose hardware but we can envisage an easy step forward where we use dedicated hardware to streamline certain operations. This means that we could further improve performance and efficiency compared to running on general purpose processors and move more algorithms to run on mobile rather than the cloud. Doing this would greatly improve latency and would eliminate having to use a network connection.
In the above image, you can see we have drawn a very tall and thin ‘8’ character. You can see that the ‘1’ and ‘8’ nodes have high activation. If we continued to make this ‘8’ character taller and thinner, the graph would break down and incorrectly label the result as a ‘1’. We can avoid this by modifying the graph to learn about these edge-cases. However, for simplicity, this demo is intended to show the OpenVX 1.1 API and the extension we have implemented.
Above is a video of the demo in action. You’ll notice that we draw the number four as closed digit, rather than two stroke ‘open’ four. This is because at present the graph does not recognise the ‘4’ character well. This is likely to be because the training data set uses other ways of drawing this digit, such as below. If we extend the data set to include these methods of writing the digit, the network would become more robust.
We are showing this demo in person at the Embedded Vision Summit in California from 1st to 3rd May 2017. Come join us for your opportunity to talk about how we can help you with your vision or machine learning problem. Imagination’s Paul Brasnett will also be talking at the summit on the subject of ‘Training CNNs for Efficient Inference‘.