Once again, we bring you another of our Visionary series of interviews with key thinkers at the company. This time we’re in conversation with Paul Brasnett, senior research manager for the PowerVR Vision & AI division. Paul leads three teams at Imagination; two research the latest developments in the rapidly moving world of neural network acceleration and a third that looks at the company’s chip designs. At the forefront of one of the hottest areas in tech right now Paul is uniquely placed to provide informed insights into the machine learning space.
Paul, to start, please tell me about your background and the path that led you to Imagination.
I did an undergraduate degree in engineering mathematics at the University of Bristol and from there I did a PhD in computer vision where I spent time looking at tracking objects within video. And after my PhD, I joined Mitsubishi Electric, who had a research lab in the UK down in Surrey. There I worked on a range of products but a major part of what I was doing was looking at image duplicate detection and video duplicate detection and that led to some work around MPEG-7 standards. That is all about the metadata around multimedia content; so, looking at what’s contained within an image or giving some sort of characteristics to the multimedia content. I joined imagination in around 2011 around the time that the Series6 GPU was in development. These were the first PowerVR GPUs that were really designed from the ground up for compute as well as graphics.
One of the things that I was looking at was, apart from graphics, for what else can you use the GPU? The work was to understanding what sort of workloads might map well; can we take algorithms and map them to the GPU, and how can we make those algorithms better? But we were also trying to feed back into the development processes for GPUs to help improve the performance on the sort of work layers we might to do – to see how we could make the hardware better.
From there, we were testing and playing around with techniques such as face detection and video processing and then we started to look at what we could do around tasks such as object tracking and object detection. Next, we then looked at whether the GPU or dedicated hardware was the right approach was for specific use cases, such as efficiently running object detection on embedded devices.
It was around this time that CNN started to take off. Originally, we’d been looking at traditional computer vision techniques such as HOG and SVM, a histogram of oriented gradients and support vector machines – that sort of thing.
Then, we started to increasingly investigate convolutional neural networks. They looked promising in terms of inference accuracy with networks such as AlexNet offering an increase in accuracy over traditional techniques. However, one of the problems was the compute loads looked very high. The question was then that if you wanted to deploy this onto a GPU, what sort of size GPU would you need to be able to run workloads in real time? It looked very high.
The next question then was what could we do to develop dedicated hardware for convolutional neural networks? Ultimately, that led to our work on the PowerVR Series2NX NNA neural network accelerator that PowerVR launched September last year.
So, that was the culmination of quite a few years of work, initially looking at the GPU and then, building up the sort of skills and knowledge across the team and ultimately developing the PowerVR Series2NX.
Were you always working with an aim to create dedicated hardware or was there a moment in your research when you realised that dedicated hardware would be the way to go?
As we started to discuss things with customers we got some performance targets and we realised that in certain scenarios the GPU was not going to reach the desired performance per millimetre squared or performance per watt target. So we saw that there were opportunities for doing things in a different way to a GPU and started bouncing a few ideas around regarding architecture. It became clear fairly quickly that you could do things significantly better if the hardware was very tightly optimised for the CNN space.
Why can’t the CNN processing be done in the cloud? Why do we need processing in an edge device?
I think there are a range of reasons. It depends on the market. If you look at mobile phones, for example, I think there’s rightly a lot of concern around privacy and data protection. Sending significant amounts of data from your phone to be processed in the cloud is not ideal, and keeping the processing on device helps to protect your privacy.
Also, in mobile, if you’re sending a lot of data to the cloud, there’s a huge cost for the providers of whatever service it is you’re using to be processed in the cloud. So, actually, they’re quite keen for things to be moved locally. And similarly, if you look at smart cameras in the home for monitoring your front door, you don’t want to be streaming video from your home to the cloud 24 hours a day, seven days a week. So, as you know, it will require some smart processing on the device and it may possibly be backed up by some processing remotely in the cloud.
For autonomous driving, the value is in the absence of latency. If you’re processing your camera data you want to know what’s happening around you, and that needs to be done as fast as possible. If you’re having to stream data to the cloud, doing processing, making decisions, sending it back, that’s likely to be too high a latency for the sort of responses you need in autonomous driving.
So what do you and your team do on a day-to-day basis?
Actually, there’s a huge amount of effort in keeping up with the state-of-the-art. There’s a lot of research going on across the world right now in the machine-learning and CNN space and just keeping up with the latest developments consumes quite a lot of time. But what we are always trying to do is to understand what the benefits are that come from a certain new feature or a new way of training; what does it really bring? As a company, we try to focus on efficiency – we’re not interested in an incremental gain for something that’s ten times the cost. Rather, given a certain performance target, can you achieve the same accuracy at a lower cost or for a given compute cost? Those are the things we’re interested in.
We spend time understanding what other people are doing, but also playing around with new ideas ourselves and trying to understand what it means for our products if we want to support certain algorithms. How do we need to be developing our hardware and software to support those? So we’re also looking at doing our own algorithm research and trying to add value into the algorithm space. It all helps to drive the requirements for the hardware and software products we develop.
So how does what you do impact final chip designs?
Well as well as leading the research teams, I also lead an architecture team. There needs to be close collaboration between them – the architecture is being driven by algorithmic understanding. That understanding also helps to prioritise what we really need to care about in the hardware. If they only give an incremental gain then you don’t need to put down specific functionality in the hardware.
We have input from customers as well so there may be some specific features and networks they require that feed directly into products. The research team will work with them to understand the requirements and to determine if really want specific networks or they want the applications that run on those networks. There may be better ways that we can help customers to get the same performance more efficiently using our hardware.
The vision and AI market is moving so quickly. How do you cope with that?
It’s challenging, right? It is an area that is moving very fast. When the hardware is developed, we’re trying not to develop point solutions. What I mean by that is that we aren’t trying to develop hardware for one specific network or one very specific requirement. We try to look at a set of requirements to come up with a generalised solution to a number of related requirements. We’re trying to understand and predict where we might be able to add some flexibility, and some options, that allow increased functionality in the future.
We’re not necessarily constraining the design to just what’s needed today. We’re trying to look beyond that. Ultimately, there’s a trade-off in terms of how much flexibility is put into a design and the area and performance budgets that you can hit.
There has been much debate about whether what we have today in terms of neural networks is actually AI. What’s your take on that? And if it isn’t, what are the challenges that we face in trying to get there?
Well no, I think what we have is not true AI. What we have today are some clever machine learning algorithms and a lot of compute. It may have some sort of characteristics of intelligence, but I think the fundamental thing that’s missing today is the ability to generalise. So, you can learn to recognise a cat, but then, if you saw a cat that looks different in some way, that’s something that the machine learning we have today can’t necessarily deal with very well.
In that sense, it’s relatively narrow in its application area. You can train it to do certain things, but beyond that, it doesn’t extrapolate. I think that’s where the challenges lie. And there’s some interesting work going on around the world in that area, but I think there’s quite a long way still to go.
What sort of timescale would you put on the arrival of true AI?
We always say 20 years! I think it’s always 20 years, right?
We’ve got Stephen Hawking and Elon Musk tolling the bell on AI. Should we be frightened of it when we get there? What are your feelings about it??
There are threats that can be seen in different ways, right? In my PhD in traditional computer vision space, a lot of work is based on a theoretical understanding of signal processing to which a bit of human intelligence is applied to work out the problem you’re trying to solve. You then develop an algorithm and finally, you test that algorithm.
However, if you look at the progress in the machine learning space over the last five years or so the stuff I did is being replaced by the machine. That could be seen as a threat. It’s like the self-checkout at the supermarket where people are using the machines to scan, and they are packing things themselves. It’s unquestionably progress and has efficiency benefits, but those self-scanning machines are a threat to the jobs of the people who are working at the checkout. Inevitably, there will be jobs that just don’t need anywhere near the level of human supervision that’s required today.
That’s sort of how I felt initially about the emergence of machine learning; it’s a disruption to how computer vision has been done in the past.
Another example would be is the haulage transport industry. There are millions of lorries being driven by humans, but humans can only work certain hours and have relatively high costs. That looks like an industry that to me would be a prime example for autonomous driving to improve efficiency. You can have trucks on the road that run 24 hours a day and at speeds that are as fuel-efficient as possible. I would expect there to be a significant loss of employment in that sort of industry, and that’s just one example.
I guess the other threat is if the machines turn on us and take over the world! I’m not sure I see that happening. I think we’re a long way from a general intelligence that would pose that sort of threat. Sorry!
Well, that’s reassuring. What’s the coolest use you’ve seen for AI?
The coolest use? It may be sort of nerdy, but to train a neural network today takes quite a lot of art; as in choosing the right hyperparameters and getting the dataset right. So there’s work going on to train your machine learning algorithms to train your machine learning algorithms; use machine learning to train your machine learning! I think that’s where things are heading right now and that seems to me quite cool.
That sounds like the machine’s building the machines?
OK, maybe they will take over the world then! Seriously, though, I think AI enables you to do more, and to do it efficiently, reducing the amount of human effort required. In that sense, it’s the next Industrial Revolution. With a machine, you could replace what would have taken ten people. That drove huge productivity improvements, which ultimately led to a better outcome for everyone.
What are the differences between the machine learning, deep learning and AI and convolutional neural networks?
I see AI as a general solution. If you visualise a Venn diagram, AI is something that can solve a wide range of things. Machine learning is part of the AI space, but there’s an awful lot of stuff it doesn’t do, such as the ability to generalise. Deep learning is a specific technique within the machine learning space and convolutional neural networks are an example of deep learning. So, you can see it as restricting spaces. Convolutional neural networks have had a lot of publicity recently and there’s a huge amount of progress being made in that area, but that needs to then expand out quite a long way to get to a full AI.
So what do you see as the next step of this AI trend? What is the next step beyond CNNs?
Good question. One of the key areas that I think is interesting is the reinforcement learning space. If you look at autonomous vehicles there are a number of tasks required. Convolutional neural network helps with your sensing, to understand what’s around you and where things are – to build up that map. However, they don’t tell you what the actions are that need to be taken – to brake, to accelerate or to move. It just helps build information about what’s going on. Then you have things like reinforcement learning, which helps to build up the technology to really understand the state of the world around you to establish what your next best move is.
There’s a lot of interesting work being done by Deep Mind, for example. And they’re looking at the board game Go. It’s useful as that is a way of having a problem that has some state and you have to decide your next move. You can see it at very much as the same problem as your autonomous car that has a state of the world around it, and the vehicle itself needs to determine what the next move is. Do you accelerate? Do you brake? Do you move left? Do you move right? And so, reinforcement learning helps to solve those sorts of problems. There’s a whole load of interesting work around that, that I would like us to be focusing on.
And then, there’s prediction. Again, taking autonomous cars as a good example – you don’t want to just understand what the world is around you but to try and understand if there’s maybe a pedestrian by your side of the road, what might their likely next action be? Do they look like they’re likely to step out into the road, or are they looking like they might fall over? Trying to predict a bit more about what’s likely to happen in the future to help better guide the car.
If I’m driving and I see some smoke up ahead, that tells me that maybe someone’s braking heavily or there’s likely to be a problem and I will adjust my driving appropriately in the expectation there’s a problem ahead. Right now that level of intelligence isn’t there.
So when do you think we are going to get full autonomous cars? When’s it actually going to happen?
So, I think a few years ago, someone told me that my son, who is now eight, would never need to have a driving license because there would be autonomous cars on the road by the time he’s old enough to drive. So, ten years from now. When I learned to drive, I think the car I learned to drive in was about 10 years old, if he’s learning to drive in 10 years’ time, it means the car he would learn to drive in would need to be on the road today – so that won’t be autonomous.
There are companies who are saying that 2020 they’ll start to have autonomous cars but there won’t be significant volumes and they’ll be somewhat restricted. Maybe they’ll be in certain towns or in certain regions, you know, they may have very limited operational ranges. You may start to see small numbers,.
I think probably until we start to see significant numbers, it’s going to be the late 2020s, into 2030s before you start to see it becoming more widespread. I might be wrong.
For you then, what is most exciting stuff going on in AI right now?
I think what is very exciting and stimulating is the rate of progress across a whole range of areas. So, it’s not restricted to one specific problem domain – there’s just so much new interesting work being done across a whole range of different problems. New applications are being discovered every day for this sort of technology.
A lot of what’s being done actually boils down to a relatively small set of operations. So, for us at Imagination, making sure we understand what those operations are and then actually enabling a wide range of applications. I think it’s an interesting, exciting space and it’s moving very fast.
Final question. If you had to pinpoint one thing or project that you’ve worked on what would you say that you’re most proud of so far?
So, I think probably the thing I’m most proud of is the work we’ve done around the PowerVR Series2NX NNA. As I said right at the start, that was work that started some years ago and has led to a completely new product line within the company. That is the thing of which I’m most proud. There are a lot of opportunities out there for us to take in that space and hopefully, it’s a product line with a long life ahead of it.
Once, when you were looking to develop a mobile application with complex graphics, in most cases you immediately thought of OpenGL ES. In this blog post we’ll be discussing the benefits of Vulkan and why you really ought to consider using this