TY - GEN
T1 - REGIMap
T2 - 50th Annual Design Automation Conference, DAC 2013
AU - Hamzeh, Mahdi
AU - Shrivastava, Aviral
AU - Vrudhula, Sarma
PY - 2013
Y1 - 2013
N2 - Coarse-Grained Reconfigurable Architectures (CGRAs) are an extremely attractive platform when both performance and power efficiency are paramount. Although the power-efficiency of CGRAs can be very high, their performance critically hinges upon the capabilities of the compiler. This is because a CGRA compiler has to perform explicit pipelining, scheduling, placement, and routing of operations. Existing CGRA compilers struggle with two main problems: 1) effectively utilizing the local register files in the PEs, and 2) high compilation times. This paper significantly improves the state-of-the-art in CGRA compilers by first creating a precise and general formulation of the problem of loop mapping on CGRAs, considering the local registers, and from the insights gained from the problem formulation, distilling an efficient and constructive heuristic solution. We show that the mapping problem, once characterized, can be reduced to the problem of finding maximal weighted clique in the product graph of the time-extended CGRA and the data dependence graph of the kernel. The heuristic we've developed results in average of 1.89 X better performance than the state-of-the-art methods when applied to several kernels from multimedia and SPEC2006 benchmarks. A unique feature of our heuristic is that it learns from failed attempts and constructively changes the schedule to achieve better mappings at lower compilation times.
AB - Coarse-Grained Reconfigurable Architectures (CGRAs) are an extremely attractive platform when both performance and power efficiency are paramount. Although the power-efficiency of CGRAs can be very high, their performance critically hinges upon the capabilities of the compiler. This is because a CGRA compiler has to perform explicit pipelining, scheduling, placement, and routing of operations. Existing CGRA compilers struggle with two main problems: 1) effectively utilizing the local register files in the PEs, and 2) high compilation times. This paper significantly improves the state-of-the-art in CGRA compilers by first creating a precise and general formulation of the problem of loop mapping on CGRAs, considering the local registers, and from the insights gained from the problem formulation, distilling an efficient and constructive heuristic solution. We show that the mapping problem, once characterized, can be reduced to the problem of finding maximal weighted clique in the product graph of the time-extended CGRA and the data dependence graph of the kernel. The heuristic we've developed results in average of 1.89 X better performance than the state-of-the-art methods when applied to several kernels from multimedia and SPEC2006 benchmarks. A unique feature of our heuristic is that it learns from failed attempts and constructively changes the schedule to achieve better mappings at lower compilation times.
UR - http://www.scopus.com/inward/record.url?scp=84879852172&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84879852172&partnerID=8YFLogxK
U2 - 10.1145/2463209.2488756
DO - 10.1145/2463209.2488756
M3 - Conference contribution
AN - SCOPUS:84879852172
SN - 9781450320719
T3 - Proceedings - Design Automation Conference
BT - Proceedings of the 50th Annual Design Automation Conference, DAC 2013
Y2 - 29 May 2013 through 7 June 2013
ER -