Designing a CPU takes a lot skill and effort. Taking said CPU and reducing its dynamic power consumption by 50% takes a particular set of skills, skills one has acquired over a very long career.
After successfully delivering the first DOK for PowerVR Rogue GPUs, Imagination and Synopsys embarked on a second project that aimed at significantly decreasing power consumption for MIPS CPUs without sacrificing any of the leading performance numbers.
Dynamic power is a hot (pardon the pun) topic among SoC designers
Reducing dynamic power is a major challenge that a lot of SoC designers today face. There are various options and factors to be considered for synthesizable IP. The same IP may be used in a wide variety of applications with different requirements across various markets. On top of that, the current crop of technologies, tools and flows add another dimension of challenges to be considered.
Performance, power and area (PPA) trade-offs vary from technology to technology and also depend on customer implementation requirements.
This article presents a summary of a recent webinar organized by Synopsys where Maya Mohan and Nagesh Sakhamuru from Imagination presented the various options currently available when using 28nm technologies together with Synopsys tools and flows.
The primary objective was to introduce SoC designers to various aspects of dynamic power saving when using a specific technology and various sub-library options combined with Synopsys tools and flows. The authors put a lot of the emphasis on dynamic power relative to leakage and showed the relative power savings at various steps and stages of the flow with technology-specific library IP selection.
The CPU that was chosen for this project was MIPS interAptiv, an ultra-efficient multi-threaded processor part of the Aptiv family. In general, a CPU is expected to run at maximum performance while keeping dynamic power as low as possible. However, running at peak performance may also increase dynamic power; this introduces the need to search for ways to balance sustained performance versus reduced power.
Dynamic power has multiple components
You can find the various components of dynamic power in a CPU core noted below: memory (M-Power), instruction and data caches, registers (R-power), leaf level clock gated and non-gated registers, clock network cells (Ck-power), clock gaters and buffers, and combinational cells (C-power).
Power components notation: Ck: Clock, R: Register, M: Memory, C: Combinational
For each of the four categories above, there are two sub-components: internal and switching power. These were analyzed individually and various options were considered for each during dynamic power optimization.
The synthesis runs were done in DCT/DCG and place and route was done in ICC. Power was measured at both stages: post-DCT as well as post-ICC. For purposes of consistency, all the power numbers reported in the webinar were taken from the post-ICC database, using switching activity from gate simulation. Extraction was done with StarRC, the Dhrystone diag was used during gate simulation and PT-PX was used for power measurement.
The figure above shows the sub-components of the total dynamic power of the MIPS interAptiv CPU. This is what was used for the baseline against which other methods and experiments were evaluated.
Usually dynamic power is measured in mW/MHz, but for this project the total power of the baseline run is normalized to 100, and the power for the other runs was derived relative to this number.
As you can see from the figure above, clock power is only about 16% of the overall power. From the charts presented here, we can easily see that the largest consumer of power is the internal memory and registers, and the switching power of combinatorial logic:
- Switching power is proportional to switching capacitance and number of toggles. For improving register and combinatorial power, at DCT most of the optimizations were done using RTL based SAIF along with self-gating techniques. Various dynamic power saving features in ICC were also explored in the webinar, along with the new CCD (Concurrent Clock and Data) feature in the final phase of the runs.
- Internal power is proportional to number of cells, size of cells and number of toggles. The internal power of registers can be reduced with smaller cell size and reducing the toggles. Also using lower track library (e.g. transitioning from 12 track to 9 track) provides another trade-off for reducing power relative to performance and area. The internal memory power is mostly attributed to the size or type of memory selected. There are various memory options available from the Synopsys memory compiler. The authors selected an RF based combination of high performance and high density single port memories where needed for balanced (efficient) power/performance goals.
Here are the very exciting results!
The combination of all the features and techniques mentioned above reduced power significantly. The flow diagram below describes the final tool options used in Synopsys synthesis and ICC:
At the end of the project, overall power consumption was reduced by about 48% and area was decreased by 46%, with only a 15% hit on the performance of the CPU.
The table above shows all the above experiments and power numbers; for the complete results, please download the full white paper where you’ll be able to see how power and area were reduced at every step of the process.