Mobile GPU compute must be practical GPU compute

By definition, mobile application scenarios must be power efficient. The reason for this is simple: they run from a battery. The target is to allow a consumer to enjoy the full functionality of a device for as long as possible from a single charge. This means that any usage scenario must be practical and useful, and not just something which burns through battery life thus leaving an unhappy consumer carrying around an unusable device.

In terms of mobile GPU compute, any compute scenario must be a practical, useful GPU compute scenario. The key characteristics explained in my previous article immediately come to mind: only consider tasks suitable for the GPU. This ideally means parallel compute tasks with minimal divergence and not serial divergent tasks, a perfect representation of ensuring that we are using the appropriate compute resource for the right task.

But even the task itself has to be practical and the overall usage scenario of the device has to be practical.

Examples of practical and impractical mobile GPU compute applications

When running a game with console-quality graphics, using GPU compute for some physics calculations does not make sense. While physics are parallel and have limited divergence, in this usage scenario the GPU is already very busy delivering stunning graphical quality to a high resolution screen, hence further loading (or perhaps more accurately overloading) the GPU with a physics task will typically just lead to an overall reduced consumer experience (e.g. lower framerate and/or lower image quality).

On the other hand, when snapping multi-megapixel pictures with your mobile phone camera, you may want to run some image enhancement routines. Loading this onto the GPU makes sense, as this is a parallel non-divergent type of task. Additionally, during the processing, the user is basically just waiting to see his picture and hence the GPU will not be very busy – apart from probably just showing an idle/waiting animation in the GUI.

So two different scenarios both pass the type of processing check but only one passes the practical usage scenario.

There are other usage scenarios that pass the processing type check but may be at odds with the practical check. Video encode and decode fall somewhere in this grey area, where most devices have dedicated resources for executing these tasks in the form of hardware blocks (video processing units). For example, PowerVR VPUs (essentially, fixed function hardware) are far more power and bandwidth efficient than using a programmable, generalized compute resource such as a PowerVR GPU. However, for platforms that do not include dedicated hardware resources for video transcoding, video transcoding using GPU compute might be a more realistic and efficient way of performing these tasks compared to, for example, using the CPU.

A failed usage scenario for mobile would be extreme types of compute which require massive processing time and precision, e.g. folding molecules or other scientific tasks. These fail the practical check as these are things you should never even consider doing on a mobile device. While you may want to view the results on your mobile device, this type of compute should be run on dedicated servers in the cloud.

GPU compute_impractical examplesBiomedical simulations and weather pattern distributions are some examples of impractical use cases for mobile GPU compute

Most compute usage scenarios for mobile battery-powered devices, at least in the near-term, will be practical, common-sense usage scenarios dominated by image and video post-processing and camera vision type of tasks. All of these meet the checks for types of compute as well as the practicality requirement.

GPU compute_practical examplesImage processing, camera vision and augmented reality applications are some examples of practical use cases for mobile GPU compute

A basic rule to remember: just because a task is parallel and non-divergent doesn’t mean that it should run on a mobile GPU – it must always be a sensible usage of your valued battery life.

If you have any questions or feedback about Imagination’s graphics IP, please use the comments box below. To keep up to date with the latest developments on PowerVR, follow us on Twitter (@GPUCompute, @PowerVRInsider and @ImaginationTech) and subscribe to our blog feed.

  • Alexandru Voica, you ‘re wrong:
    PowerVR G6200 (MediaTek) :
    16USSE2 x 2 Clusters x 0.280MHz x 9 = 80GFLOPS
    This is tantamount to SGX554 MP4 (80 GFLOPS).
    eeTimes says :

    • Hi,
      How does that chart contradict any of the statements in the article above (or my comments)?
      Please try to stay on topic, this article is about mobile GPU compute applications and makes no reference to PowerVR G6200 or any other specific GFLOPS numbers.
      If you need more information on the MT8135, please click on the link below.
      withimagination.imgtec.com/index.php/powervr/mediatek-mt8135-brings-powervr-series6-gpus-to-a-mobile-device-near-you
      Best regards,
      Alex.

      • Below said that the formula was wrong . On the contrary .
        Although in that article does not mention, it is difficult to know the specific results . Ultimately, it is almost certain that things look as the formula indicates . Of course, G Series allows much higher clock frequency, but it is harmful to the battery in PDA devices .
        Thanks for the reply !

        • On the contrary, PowerVR Series6 GPUs introduce a number of hardware features designed to keep power consumption to a minimum (lossless image compression, PVRTC/PVRTC2, etc.).
          Please read the articles carefully before jumping to conclusions or speculating on performance numbers.
          Regards,
          Alex.

          • Whether USSE2 or USC, there’s nothing more important. The important thing is that there are 16 pipelines!!
            Ultimately PowerVR G6200 (MediaTek) is a 80GFLOPS (280MHz). eeTimes says.

  • Because our PowerVR ‘Rogue’ cluster architecture scales linearly in performance, PowerVR G6200 (2 clusters) is 2x the GFLOPS performance of PowerVR G6100 (1 cluster), for the same frequency.
    Another great advantage of PowerVR ‘Rogue’ GPUs is that the cluster-based structure avoids replicating coherency-related overhead resources that competing multicore GPUs still need to maintain.
    Regards,
    Alex.

  • Always write that the new PowerVR G6XX0 has a small area and power consumption. But…
    1. What is REAL area of PowerVR G6100/6200/6400 ?
    2. And what is frequency of GPU PowerVR G6400/6200/6100 now (for example G6200 in MT8135)?

    • 1. The resulting die size of a core implemented onto a System-On-a-Chip can differ depending on whether the semiconductor company optimized the implementation for a smaller area, higher frequency, or laid it out in a way to better control power consumption and heat dissipation (at the expense of extra die area). Choices to vary on-die buffer sizes or considerations to the support resources for outfitting the core with better bandwidth can also be made and effect the resulting die area.
      I assume PowerVR cores might end up somewhat larger than some competing cores, but trading some extra area for better thermals/power efficiency is the right design choice to make for mobile designs.
      2. Speculation is that the G6200 in the MT8135 may end up getting targeted at around 300 MHz. That’s a low target compared to what some other semiconductor companies have been considering with their Rogue implementations: 400 MHz to 600 MHz and even beyond.

      • Thanks for the answer!
        Can you clarify about the PowerVR G6200 in MT8135:
        1. About the area.
        If you compare the G6200 with SGX544 and SGX554, so its area will be like MP1 or MP2 or MP4 (or between some of them)?
        2. About the frequency.
        MediaTek “said” about 80 GFLOPS for PowerVR G6200 in MT8135. But if you look at the “formula”:
        16USSE2 x 2Clusters x 0.300GHz x 9 = 86,4 GFLOPS
        So frequency of PowerVR G6200 in MT8135 is less than 300 MHz ? Or what is the TRUE formula to GFLOPS-calculate (PowerVR G6xxx) ?

      • Hi,
        Indeed, all PowerVR G6x30 GPUs have been optimised for maximum efficiency but still manage to keep power consumption to a minimum even with an incremental increase in area; PowerVR G6x00 GPUs are designed to deliver the best performance at the smallest area possible
        An example feature included in PowerVR G6x30 cores is lossless compression which reduces GPU bandwidth usage thus enabling higher performance and reduced power consumption.
        As always, I am unable to comment on the specific GPU frequency for any application processor unless explicitly stated by the silicon vendor.
        Regards,
        Alex.

  • The microkernal of PowerVR GPUs should assist them in getting better results from Renderscript even with the APIs lack of functionality in that regard. Future enhancements to Renderscript as well as more targeted implementations with Filterscript should bring improvement, too.
    Determining whether moibile GPU compute makes sense on a task level requires evaluating the trade-offs involved. Processing game physics when it’ll be taking away from the visual splendor you were after in the first place can be a bad trade-off, but seeing realistic physical behavior of in-game objects could also sometimes be more impressive than just layering on additional scene complexity to the graphics.

  • Great post! Succinct, clear, and spot on.
    Now if only Android’s Renderscript allowed for the distinction of executing hardware. This, and lack of features is somewhat frustrating when considering Android’s het-compute API — I sincerely hope that it improves to take advantage the wonderful characteristics of your GPUs.

      • Thanks Alex! Any additional insight into Renderscript and imagination GPUs (and CPUs) would be very welcome — I’m looking forward to the post! I’m actually excited about the API, though it’s progress seems somewhat slow.

  • Search by Tag

    Search for posts by tag.

    Search by Author

    Search for posts by one of our authors.

    Featured posts
    Popular posts

    Blog Contact

    If you have any enquiries regarding any of our blog posts, please contact:

    United Kingdom

    benny.har-even@imgtec.com
    Tel: +44 (0)1923 260 511

    Related blog articles

    British Engineering Excellence Award

    PowerVR Vision & AI design team collect another award

    We’re delighted that the design team for our PowerVR Series2NX Neural Network Accelerator (NNA) has been honoured with a prestigious British Engineering Excellence Award (BEEA). The BEEAs were established in 2009 to demonstrate the high calibre of engineering design and innovation in the

    Series8XT AR/VR Banner

    Imagination Technologies: the ray tracing pioneers

    After a period out of the spotlight, ray tracing technology has recently come back into focus, taking up a lot of column inches in the tech press. The primary reason is because graphics cards for the PC gaming market have

    Amazon Fire Stick 4K pic

    Amazon Lights up its Fire TV Stick 4K with PowerVR

    Amazon, the internet shopping giant, announced earlier this week the latest version of its media streaming device, the Fire TV Stick 4K. First released in 2016, the Fire TV stick brings catch-up streaming services to any TV with an HDMI

    Stay up-to-date with Imagination

    Sign up to receive the latest news and product updates from Imagination straight to your inbox.

    • This field is for validation purposes and should be left unchanged.
    >
    Contact Us

    Contact Us