SIMD (Single Instruction Multiple Data) is a class of instructions introduced in modern CPU designs to boost parallel processing performance by allowing efficient handling of vector operations.

MSA (MIPS SIMD Architecture) offers developers access to a flexible and powerful 128-bit SIMD engine that delivers superior acceleration for multimedia and other compute-intensive applications.

New application processors using MSA instructions can deliver comprehensive support for a wide range of applications without excessive dependence on specific hardware accelerators. Platforms incorporating these MIPS-based processors can evolve and adapt to new tasks that have leading-edge requirements.

In addition, MIPS SIMD and other programmable solutions (e.g. GPU compute) can provide the flexibility needed by future embedded platforms to adapt to the unknown tasks that heterogeneous platforms may tackle in the future.

One such application is video processing – and it represents the topic of a recently published whitepaper from Imagination that looks at how MIPS Warrior CPUs can accelerate VP9 codecs using MSA instructions.

The VP9 codec is part of the Google-sponsored WebM Project and achieves better quality at approximately half the size of previous generation encoding technologies. VP9 is ideal for streaming video at up to Full HD and 4K resolutions (e.g. YouTube) and can be deployed in WebRTC-based video conferencing applications.

Frequently used operations in multimedia processing which can be vectorized using MSA include:

  • Addition/subtraction operations
  • Multiply and Accumulate operations (Dot product and simple multiplications)
  • Logical and arithmetic shift operations (optional rounding)
  • Other logical operations (AND,OR, XOR, etc.)
  • Conditional selection/masking
  • Load and store
  • Pack-unpack/interleaving operations

The whitepaper mentioned above focuses on the ability to create new SIMD-optimized software that exploits the power of the MIPS processor pipeline without the use of assembler coding.

The authors state that the built-in data types and intrinsics offer several advantages to developers using MSA:

  • The C code becomes quickly portable across all MSA implementations so developers don’t have to worry about subtle micro-architectural differences between MIPS CPUs.
  • The compiler can make the best use of the SIMD instructions and efficiently utilize the available number of vector registers and instruction throughputs to generate best possible assembly code.

The whitepaper concludes that using MSA built-ins and data types, developers can reduce development time and obtain fast-running, portable code. In the specific example cited in the whitepaper, the instruction count for a typical VP9 decoder workload is reduced by more than a factor of 10.

If you’d like to start writing MSA C code now, you can download the QEMU emulator provided by the prpl foundation and the free Codescape SDK offered by Imagination. More advanced users can contact us to get accesss to the the professional edition of our Codescape SDK.


  • Michał Szymański

    Is there any document with cycle timings of these SIMD instructions?