TY - GEN
T1 - URECA
T2 - 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018
AU - Dave, Shail
AU - Balasubramanian, Mahesh
AU - Shrivastava, Aviral
N1 - Publisher Copyright:
© 2018 EDAA.
PY - 2018/4/19
Y1 - 2018/4/19
N2 - Coarse-grained reconfigurable array (CGRA) is a promising solution to accelerate loops featuring loop-carried dependencies or low trip-counts. One challenge in compiling for CGRAs is to efficiently manage both recurring (repeatedly written and read) and nonrecurring (read-only) variables of loops. Although prior works manage recurring variables in rotating register file (RF), they access the nonrecurring variables through the on-chip memory. It increases memory accesses and degrades the performance. Alternatively, both the variables can be managed in separate rotating and nonrotating RFs. But, it increases code size and effective utilization of the registers becomes challenging. Instead, this paper proposes to manage the variables in a single nonrotating RF. During mapping loop operations on CGRA, the compiler allocates necessary registers and splits RF in rotating and nonrotating parts. While rotation is implemented by a modulo addition based indexing mechanism, read-only values are preloaded and directly accessed. Evaluating compute-intensive benchmarks from MiBench show that URECA provides a geomean speedup of 11.41x over sequential loop execution. It improves the loop acceleration through CGRAs by 1.74x at 32% reduced energy consumption over state-of-the-art.
AB - Coarse-grained reconfigurable array (CGRA) is a promising solution to accelerate loops featuring loop-carried dependencies or low trip-counts. One challenge in compiling for CGRAs is to efficiently manage both recurring (repeatedly written and read) and nonrecurring (read-only) variables of loops. Although prior works manage recurring variables in rotating register file (RF), they access the nonrecurring variables through the on-chip memory. It increases memory accesses and degrades the performance. Alternatively, both the variables can be managed in separate rotating and nonrotating RFs. But, it increases code size and effective utilization of the registers becomes challenging. Instead, this paper proposes to manage the variables in a single nonrotating RF. During mapping loop operations on CGRA, the compiler allocates necessary registers and splits RF in rotating and nonrotating parts. While rotation is implemented by a modulo addition based indexing mechanism, read-only values are preloaded and directly accessed. Evaluating compute-intensive benchmarks from MiBench show that URECA provides a geomean speedup of 11.41x over sequential loop execution. It improves the loop acceleration through CGRAs by 1.74x at 32% reduced energy consumption over state-of-the-art.
UR - http://www.scopus.com/inward/record.url?scp=85048797891&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048797891&partnerID=8YFLogxK
U2 - 10.23919/DATE.2018.8342172
DO - 10.23919/DATE.2018.8342172
M3 - Conference contribution
AN - SCOPUS:85048797891
T3 - Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018
SP - 1081
EP - 1086
BT - Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 March 2018 through 23 March 2018
ER -