Processor Idle Cycle Aggregation (PICA) is a promising approach for low-power execution of processors, in which small memory stalls are aggregated to create large ones, enabling profitable switch of the processor into low-power mode. We extend the previous approach in three dimensions. First we develop static analysis for the PICA technique and present optimal parameters for five common types of loops based on steady-state analysis. Second, to remedy the weakness of software-only control in varying environment, we enhance PICA with minimal hardware extension that ensures correct execution for any loops and parameters, thus greatly facilitating exploration-based parameter tuning. Third, we demonstrate that our PICA technique can be applied to certain types of nested loops with variable bounds, thus enhancing the applicability of PICA. We validate our analytical model against simulation-based optimization and also show, through our experiments on embedded application benchmarks, that our technique can be applied to a wide range of loops with average 20% energy reductions, compared to executions without PICA.
ASJC Scopus subject areas
- Hardware and Architecture