Low-Overhead Energy Estimation for Energy-Aware Online Scheduling Techniques
In the last couple of years, Moore's law has been superseded by power and energy consumption as the main limiting factor in creating larger and more powerful digital devices. For the processor to work, energy needs to be supplied to the billions of transistors in the form of electrical current. Today's transistors require a voltage of around 1 volt, yet it is not unusual for a high-performance processor to consume hundreds of watts requiring a current in the order of hundreds of amps. After being used for computation, the energy has to be removed as heat from the package. The energy density at the surface of the die can easily surpass the one of a hot plate requiring expensive cooling solutions.
In mobile applications power also plays a central role as portable devices are usually powered with a battery. Maximizing battery life requires the SoC to make energy-aware decisions to minimize power consumption. There is also a demand for more compute power in automotive application but reliability requirements usually prevent active cooling solutions for the processors. If left standing in the sun, cars can heat up to extreme temperatures, yet it is still expected for all the processors to be adequately cooled, directly enforcing a power limit.
In order to work with a limited power budget, we need to know the current power consumption of the processor to not exceed the given power budget. Power can be measured by introducing an external voltage and current meter in the processor’s power supply to sample its power consumption. However, this would make the fabrication of the power supply more expensive. In mobile applications, there might not even be any available space to account for such an external measurement circuit.
Our approach aims to estimate the power consumption with no hardware and only a very small software overhead. We achieve this goal by analyzing the current architectural state of the processor and using this information to estimate the power consumption within 15% of a dedicated measurement circuit.
As previously stated, we cannot use any external power meter to measure the power currently consumed by the processor. However, we should be able to estimate the power consumed by measuring the load of the processor and what sort of workloads it executes.
Modern processors have specialized performance registers, where they present information about a certain aspect of their execution. These registers are primarily meant to be used by software developers to optimize their application. In the case of an ARMv8 processor, values like the number of retired instructions or the amount of data written to memory can be accessed. These performance registers are well suited to estimate the power consumption of the processor, as they provide us with insight about the nature of the processor’s current workload.
The Power consumed of a digital circuit can be calculated with:
where is the power consumed when the system is idle, is the voltage, the frequency, a constant, and the switching activity of the circuit.  The switching activity is directly linked to the load of the processor; if the processor is busy its transistors keep switching resulting in a high activity
We want to simplify formula 1 to better understand the dependency between power and the switching activity, respectively the load. We can assume that both frequency and voltage stay constant if we sample at a high enough rate. Formula 1 can therefore be rewritten as , where and are natural numbers. We can now use the dependency between and the load of the processor to deduce:
where and are again natural numbers.
Once we know coefficients and , we can relate the load of the processor to its power consumption, without needing an external voltage / current meter. To find and , one single modified system with a power meter is required. In this system, we can measure the power and the load and use a statistical method (like linear regression) to fit and
Figure 1: Energy consumption of the ARM processor running a benchmark suite. The orange curve shows the power reported by the voltage / current meter, the yellow border shows the error the meters introduce, and finally, the blue curve denotes the power predicted with our method.
ARMv8 processors allow us to measure only 3 to 8 performance registers concurrently. This forces us to select the events that best correlate with the load specified in formula 2.
We evaluated the best correlating events by doing an exhaustive search. A subset of the resulting correlation matrix can be found in figure 2. The x-axis denotes the different performance registers whereas the y-axis shows the different benchmarks. A value of 1 means a high pearson correlation coefficient between the register content and the power of a certain benchmark, 0 a non-existing correlation. We can see in figure 2, that memory movement (either refill of the L1 caches or access in main memory) but also the number of retired instructions correlate well with the power.
This method easily allows us to find an optimal set of performance registers to estimate the load which in turn correlates to the power. So far, we can estimate the processor's performance with this method with an error of less than 15%. As you can see in figure 1, the error is comparable to the uncertainty of the external of the INA3221 used as a reference value.
Figure 2: Correlation between the individual performance registers (x-axis) and the activity for two selected benchmark applications.
Our approach uses performance registers already present to estimate the real-time power consumption of a processor or a system on chip without requiring additional hardware.
From sampling the load of the processor only, we can in real-time estimate the power currently consumed allowing us to tackle the problems mentioned in the introduction. For applications with a fixed power budget or limited heat dissipation capabilities we can estimate the power consumed online and slow the processor down should the power consumption exceed the given budget or if there is an imminent danger of overheating the system.
 The voltage is linked to the frequency; if the processor runs at a higher frequency a higher voltage is required for stable operation.