TY - GEN
T1 - LA-LRU
T2 - 24th International Conference on VLSI Design, VLSI Design 2011, Held Jointly with 10th International Conference on Embedded Systems
AU - Jain, Aarul
AU - Shrivastava, Aviral
AU - Chakrabarti, Chaitali
PY - 2011
Y1 - 2011
N2 - Parameter variations in deep sub-micron integrated circuits cause chip characteristics to deviate during semiconductor fabrication process. These variations are dominant in memory systems such as caches and the delay spread due to process variation impacts the performance of a cache based system significantly. In this paper, we propose two schemes to reduce the performance impact of variations in caches: i) Latency-Aware Least Recently Used (LA-LRU) replacement policy which ensures that cache blocks that are affected by process variation are accessed less frequently, and ii) Block Rearrangement scheme that distributes cache blocks with high latencies to all sets uniformly. We implemented our schemes on the Wattch SimpleScalar toolset for Xscale, PowerPC and Alpha21264-like processor configurations. Our experiments on SPEC 2000 benchmarks show that our scheme improves the average memory access time of caches by 11% to 22%, almost eliminating any performance degradation due to variations. We also synthesized the LA-LRU logic, to find out that we can obtain this benefit at negligible increase in the power consumption of the cache.
AB - Parameter variations in deep sub-micron integrated circuits cause chip characteristics to deviate during semiconductor fabrication process. These variations are dominant in memory systems such as caches and the delay spread due to process variation impacts the performance of a cache based system significantly. In this paper, we propose two schemes to reduce the performance impact of variations in caches: i) Latency-Aware Least Recently Used (LA-LRU) replacement policy which ensures that cache blocks that are affected by process variation are accessed less frequently, and ii) Block Rearrangement scheme that distributes cache blocks with high latencies to all sets uniformly. We implemented our schemes on the Wattch SimpleScalar toolset for Xscale, PowerPC and Alpha21264-like processor configurations. Our experiments on SPEC 2000 benchmarks show that our scheme improves the average memory access time of caches by 11% to 22%, almost eliminating any performance degradation due to variations. We also synthesized the LA-LRU logic, to find out that we can obtain this benefit at negligible increase in the power consumption of the cache.
UR - http://www.scopus.com/inward/record.url?scp=79952835568&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79952835568&partnerID=8YFLogxK
U2 - 10.1109/VLSID.2011.24
DO - 10.1109/VLSID.2011.24
M3 - Conference contribution
AN - SCOPUS:79952835568
SN - 9780769543482
T3 - Proceedings of the IEEE International Conference on VLSI Design
SP - 298
EP - 303
BT - Proceedings - 24th International Conference on VLSI Design, VLSI Design 2011, Held Jointly with 10th International Conference on Embedded Systems
Y2 - 2 January 2011 through 7 January 2011
ER -