CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs

Mahesh Balasubramanian; Aviral Shrivastava

doi:10.1109/TCAD.2020.3022015

CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs

Mahesh Balasubramanian, Aviral Shrivastava

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

Coarse-grain reconfigurable arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler, onto the CGRA architecture. The CGRA mapping problem, being NP-complete, is performed in a two-step process, namely, scheduling and mapping. The scheduling algorithm allocates timeslots to the nodes of the data flow graph, and the mapping algorithm maps the scheduled nodes onto the processing elements of the CGRA. On a mapping failure, the initiation interval (II) is increased and a new schedule is obtained for the increased II. Most previous mapping techniques use the iterative modulo scheduling (IMS) algorithm to find a schedule for a given II. Since IMS generates a resource-constrained as-soon-as-possible (ASAP) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable. Therefore, IMS does not explore the schedule space effectively. To address these issues, this article proposes CRIMSON, compute-intensive loop acceleration by randomized IMS and optimized mapping technique that generates random modulo schedules by exploring the schedule space, thereby creating different modulo schedules at a given and increased II. CRIMSON also employs a novel conservative test after scheduling to prune valid schedules that are not mappable. From our study conducted on the top 24 performance-critical loops (run for more than 7% of application time) from MiBench, Rodinia, and Parboil, we found that previous state-of-the-art approaches that use IMS, such as RAMP and GraphMinor could not map five and seven loops, respectively, on a $4\times 4$ CGRA, whereas CRIMSON was able to map them all. For loops mapped by the previous approaches, CRIMSON achieved a comparable II.

Original language	English (US)
Article number	9187262
Pages (from-to)	3300-3310
Number of pages	11
Journal	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume	39
Issue number	11
DOIs	https://doi.org/10.1109/TCAD.2020.3022015
State	Published - Nov 2020

Keywords

Coarse-grained reconfigurable arrays (CGRAs)
compiler
modulo scheduling
randomized scheduling

ASJC Scopus subject areas

Software
Computer Graphics and Computer-Aided Design
Electrical and Electronic Engineering

Access to Document

10.1109/TCAD.2020.3022015

Cite this

@article{f835335450c84c06a98ec00274ac5373,

title = "CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs",

abstract = "Coarse-grain reconfigurable arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler, onto the CGRA architecture. The CGRA mapping problem, being NP-complete, is performed in a two-step process, namely, scheduling and mapping. The scheduling algorithm allocates timeslots to the nodes of the data flow graph, and the mapping algorithm maps the scheduled nodes onto the processing elements of the CGRA. On a mapping failure, the initiation interval (II) is increased and a new schedule is obtained for the increased II. Most previous mapping techniques use the iterative modulo scheduling (IMS) algorithm to find a schedule for a given II. Since IMS generates a resource-constrained as-soon-as-possible (ASAP) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable. Therefore, IMS does not explore the schedule space effectively. To address these issues, this article proposes CRIMSON, compute-intensive loop acceleration by randomized IMS and optimized mapping technique that generates random modulo schedules by exploring the schedule space, thereby creating different modulo schedules at a given and increased II. CRIMSON also employs a novel conservative test after scheduling to prune valid schedules that are not mappable. From our study conducted on the top 24 performance-critical loops (run for more than 7% of application time) from MiBench, Rodinia, and Parboil, we found that previous state-of-the-art approaches that use IMS, such as RAMP and GraphMinor could not map five and seven loops, respectively, on a $4\times 4$ CGRA, whereas CRIMSON was able to map them all. For loops mapped by the previous approaches, CRIMSON achieved a comparable II. ",

keywords = "Coarse-grained reconfigurable arrays (CGRAs), compiler, modulo scheduling, randomized scheduling",

author = "Mahesh Balasubramanian and Aviral Shrivastava",

note = "Funding Information: Manuscript received August 6, 2020; accepted August 31, 2020. Date of publication September 7, 2020; date of current version October 27, 2020. This work was supported in part by the National Science Foundation under Grant CSN 1525855 and Grant CCF 1723476 CAPA, and in part by the NSF/Intel Joint Research Center for Computer Assisted Programming for Heterogeneous Architectures. This article was recommended by Associate Editor P. Pande. (Corresponding author: Mahesh Balasubramanian.) Mahesh Balasubramanian is with the School of Computing Informatics Decision and Systems Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail: mbalasu2@asu.edu). Publisher Copyright: {\textcopyright} 1982-2012 IEEE.",

year = "2020",

month = nov,

doi = "10.1109/TCAD.2020.3022015",

language = "English (US)",

volume = "39",

pages = "3300--3310",

journal = "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems",

issn = "0278-0070",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "11",

}

TY - JOUR

T1 - CRIMSON

T2 - Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs

AU - Balasubramanian, Mahesh

AU - Shrivastava, Aviral

N1 - Funding Information: Manuscript received August 6, 2020; accepted August 31, 2020. Date of publication September 7, 2020; date of current version October 27, 2020. This work was supported in part by the National Science Foundation under Grant CSN 1525855 and Grant CCF 1723476 CAPA, and in part by the NSF/Intel Joint Research Center for Computer Assisted Programming for Heterogeneous Architectures. This article was recommended by Associate Editor P. Pande. (Corresponding author: Mahesh Balasubramanian.) Mahesh Balasubramanian is with the School of Computing Informatics Decision and Systems Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail: mbalasu2@asu.edu). Publisher Copyright: © 1982-2012 IEEE.

PY - 2020/11

Y1 - 2020/11

N2 - Coarse-grain reconfigurable arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler, onto the CGRA architecture. The CGRA mapping problem, being NP-complete, is performed in a two-step process, namely, scheduling and mapping. The scheduling algorithm allocates timeslots to the nodes of the data flow graph, and the mapping algorithm maps the scheduled nodes onto the processing elements of the CGRA. On a mapping failure, the initiation interval (II) is increased and a new schedule is obtained for the increased II. Most previous mapping techniques use the iterative modulo scheduling (IMS) algorithm to find a schedule for a given II. Since IMS generates a resource-constrained as-soon-as-possible (ASAP) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable. Therefore, IMS does not explore the schedule space effectively. To address these issues, this article proposes CRIMSON, compute-intensive loop acceleration by randomized IMS and optimized mapping technique that generates random modulo schedules by exploring the schedule space, thereby creating different modulo schedules at a given and increased II. CRIMSON also employs a novel conservative test after scheduling to prune valid schedules that are not mappable. From our study conducted on the top 24 performance-critical loops (run for more than 7% of application time) from MiBench, Rodinia, and Parboil, we found that previous state-of-the-art approaches that use IMS, such as RAMP and GraphMinor could not map five and seven loops, respectively, on a $4\times 4$ CGRA, whereas CRIMSON was able to map them all. For loops mapped by the previous approaches, CRIMSON achieved a comparable II.

AB - Coarse-grain reconfigurable arrays (CGRAs) are emerging accelerators that promise low-power acceleration of compute-intensive loops in applications. The acceleration achieved by CGRA relies on the efficient mapping of the compute-intensive loops by the CGRA compiler, onto the CGRA architecture. The CGRA mapping problem, being NP-complete, is performed in a two-step process, namely, scheduling and mapping. The scheduling algorithm allocates timeslots to the nodes of the data flow graph, and the mapping algorithm maps the scheduled nodes onto the processing elements of the CGRA. On a mapping failure, the initiation interval (II) is increased and a new schedule is obtained for the increased II. Most previous mapping techniques use the iterative modulo scheduling (IMS) algorithm to find a schedule for a given II. Since IMS generates a resource-constrained as-soon-as-possible (ASAP) scheduling, even with increased II, it tends to generate a similar schedule that is not mappable. Therefore, IMS does not explore the schedule space effectively. To address these issues, this article proposes CRIMSON, compute-intensive loop acceleration by randomized IMS and optimized mapping technique that generates random modulo schedules by exploring the schedule space, thereby creating different modulo schedules at a given and increased II. CRIMSON also employs a novel conservative test after scheduling to prune valid schedules that are not mappable. From our study conducted on the top 24 performance-critical loops (run for more than 7% of application time) from MiBench, Rodinia, and Parboil, we found that previous state-of-the-art approaches that use IMS, such as RAMP and GraphMinor could not map five and seven loops, respectively, on a $4\times 4$ CGRA, whereas CRIMSON was able to map them all. For loops mapped by the previous approaches, CRIMSON achieved a comparable II.

KW - Coarse-grained reconfigurable arrays (CGRAs)

KW - compiler

KW - modulo scheduling

KW - randomized scheduling

UR - http://www.scopus.com/inward/record.url?scp=85090993639&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85090993639&partnerID=8YFLogxK

U2 - 10.1109/TCAD.2020.3022015

DO - 10.1109/TCAD.2020.3022015

M3 - Article

AN - SCOPUS:85090993639

SN - 0278-0070

VL - 39

SP - 3300

EP - 3310

JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

IS - 11

M1 - 9187262

ER -

CRIMSON: Compute-Intensive Loop Acceleration by Randomized Iterative Modulo Scheduling and Optimized Mapping on CGRAs

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this