TY - JOUR
T1 - CRIMSON
T2 - Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs
AU - Balasubramanian, Mahesh
AU - Shrivastava, Aviral
N1 - Funding Information:
Manuscript received August 6, 2020; accepted August 31, 2020. Date of publication September 7, 2020; date of current version October 27, 2020. This work was supported in part by the National Science Foundation under Grant CSN 1525855 and Grant CCF 1723476 CAPA, and in part by the NSF/Intel Joint Research Center for Computer Assisted Programming for Heterogeneous Architectures. This article was recommended by Associate Editor P. Pande. (Corresponding author: Mahesh Balasubramanian.) Mahesh Balasubramanian is with the School of Computing Informatics Decision and Systems Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail: mbalasu2@asu.edu).
Publisher Copyright:
© 1982-2012 IEEE.
PY - 2020/11
Y1 - 2020/11
N2 - Coarse-grain reconfigurable arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler, onto the CGRA architecture. The CGRA mapping problem, being NP-complete, is performed in a two-step process, namely, scheduling and mapping. The scheduling algorithm allocates timeslots to the nodes of the data flow graph, and the mapping algorithm maps the scheduled nodes onto the processing elements of the CGRA. On a mapping failure, the initiation interval (II) is increased and a new schedule is obtained for the increased II. Most previous mapping techniques use the iterative modulo scheduling (IMS) algorithm to find a schedule for a given II. Since IMS generates a resource-constrained as-soon-as-possible (ASAP) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable. Therefore, IMS does not explore the schedule space effectively. To address these issues, this article proposes CRIMSON, compute-intensive loop acceleration by randomized IMS and optimized mapping technique that generates random modulo schedules by exploring the schedule space, thereby creating different modulo schedules at a given and increased II. CRIMSON also employs a novel conservative test after scheduling to prune valid schedules that are not mappable. From our study conducted on the top 24 performance-critical loops (run for more than 7% of application time) from MiBench, Rodinia, and Parboil, we found that previous state-of-the-art approaches that use IMS, such as RAMP and GraphMinor could not map five and seven loops, respectively, on a $4\times 4$ CGRA, whereas CRIMSON was able to map them all. For loops mapped by the previous approaches, CRIMSON achieved a comparable II.
AB - Coarse-grain reconfigurable arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler, onto the CGRA architecture. The CGRA mapping problem, being NP-complete, is performed in a two-step process, namely, scheduling and mapping. The scheduling algorithm allocates timeslots to the nodes of the data flow graph, and the mapping algorithm maps the scheduled nodes onto the processing elements of the CGRA. On a mapping failure, the initiation interval (II) is increased and a new schedule is obtained for the increased II. Most previous mapping techniques use the iterative modulo scheduling (IMS) algorithm to find a schedule for a given II. Since IMS generates a resource-constrained as-soon-as-possible (ASAP) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable. Therefore, IMS does not explore the schedule space effectively. To address these issues, this article proposes CRIMSON, compute-intensive loop acceleration by randomized IMS and optimized mapping technique that generates random modulo schedules by exploring the schedule space, thereby creating different modulo schedules at a given and increased II. CRIMSON also employs a novel conservative test after scheduling to prune valid schedules that are not mappable. From our study conducted on the top 24 performance-critical loops (run for more than 7% of application time) from MiBench, Rodinia, and Parboil, we found that previous state-of-the-art approaches that use IMS, such as RAMP and GraphMinor could not map five and seven loops, respectively, on a $4\times 4$ CGRA, whereas CRIMSON was able to map them all. For loops mapped by the previous approaches, CRIMSON achieved a comparable II.
KW - Coarse-grained reconfigurable arrays (CGRAs)
KW - compiler
KW - modulo scheduling
KW - randomized scheduling
UR - http://www.scopus.com/inward/record.url?scp=85090993639&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090993639&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2020.3022015
DO - 10.1109/TCAD.2020.3022015
M3 - Article
AN - SCOPUS:85090993639
SN - 0278-0070
VL - 39
SP - 3300
EP - 3310
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
IS - 11
M1 - 9187262
ER -