RAMP: Resource-aware mapping for CGRAs

Shail Dave; Mahesh Balasubramanian; Aviral Shrivastava

doi:10.1145/3195970.3196101

RAMP: Resource-aware mapping for CGRAs

Shail Dave, Mahesh Balasubramanian, Aviral Shrivastava

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

45 Scopus citations

Abstract

Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.

Original language	English (US)
Title of host publication	Proceedings of the 55th Annual Design Automation Conference, DAC 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Print)	9781450357005
DOIs	https://doi.org/10.1145/3195970.3196101
State	Published - Jun 24 2018
Event	55th Annual Design Automation Conference, DAC 2018 - San Francisco, United States Duration: Jun 24 2018 → Jun 29 2018

Publication series

Name	Proceedings - Design Automation Conference
Volume	Part F137710
ISSN (Print)	0738-100X

Other

Other	55th Annual Design Automation Conference, DAC 2018
Country/Territory	United States
City	San Francisco
Period	6/24/18 → 6/29/18

ASJC Scopus subject areas

Computer Science Applications
Control and Systems Engineering
Electrical and Electronic Engineering
Modeling and Simulation

Access to Document

10.1145/3195970.3196101

Cite this

RAMP: Resource-aware mapping for CGRAs. / Dave, Shail; Balasubramanian, Mahesh; Shrivastava, Aviral.
Proceedings of the 55th Annual Design Automation Conference, DAC 2018. Institute of Electrical and Electronics Engineers Inc., 2018. a127 (Proceedings - Design Automation Conference; Vol. Part F137710).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Dave, S, Balasubramanian, M & Shrivastava, A 2018, RAMP: Resource-aware mapping for CGRAs. in Proceedings of the 55th Annual Design Automation Conference, DAC 2018., a127, Proceedings - Design Automation Conference, vol. Part F137710, Institute of Electrical and Electronics Engineers Inc., 55th Annual Design Automation Conference, DAC 2018, San Francisco, United States, 6/24/18. https://doi.org/10.1145/3195970.3196101

@inproceedings{71ef0c77afe240c897bd54e143a98f03,

title = "RAMP: Resource-aware mapping for CGRAs",

abstract = "Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.",

author = "Shail Dave and Mahesh Balasubramanian and Aviral Shrivastava",

note = "Funding Information: However, computation time of RAMP is comparable to REGIMap and MEMMap (in order of seconds), if not always better. Essentially, this stems from higher mapping quality (fewer iterations due to 2× better II) and far less nodes to be mapped (i.e., smaller n) in any of the attempts. For example, both REGIMap and MEMMap load the live-in data from the memory [10, 13, 18]. Plus, REGIMap cannot spill the data and requires many routing operations, when constrained by the availability of few local registers. Similarly, MEMMap often routes data via memory, even if enough registers are available. Thus, they have to map 1.5×-2× nodes than RAMP. 7 SUMMARY This paper presents challenges with existing mapping techniques, which are unable to make good use of the routing resources. They first schedule the DDG and then attempt the P&R; routing is internal to P&R and is carried out in an ad-hoc manner. As a result, the operations may not be mapped due to resource constraints. This paper introduces RAMP which models various routing strategies explicitly and flexibly explore various ways to map the data dependencies while exploiting the CGRA resources. RAMP accelerates the top performance-critical loops of MiBench by 23× over a sequential execution, and by 2.13× over state-of-the-art techniques. ACKNOWLEDGMENTS This work was partially supported by funding from NSF grants CNS 1525855 and CCF 172346 - NSF/Intel joint research center for Computer Assisted Programming for Heterogeneous Architectures (CAPA). REFERENCES Publisher Copyright: {\textcopyright} 2018 Association for Computing Machinery.; 55th Annual Design Automation Conference, DAC 2018 ; Conference date: 24-06-2018 Through 29-06-2018",

year = "2018",

month = jun,

day = "24",

doi = "10.1145/3195970.3196101",

language = "English (US)",

isbn = "9781450357005",

series = "Proceedings - Design Automation Conference",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "Proceedings of the 55th Annual Design Automation Conference, DAC 2018",

}

TY - GEN

T1 - RAMP

T2 - 55th Annual Design Automation Conference, DAC 2018

AU - Dave, Shail

AU - Balasubramanian, Mahesh

AU - Shrivastava, Aviral

N1 - Funding Information: However, computation time of RAMP is comparable to REGIMap and MEMMap (in order of seconds), if not always better. Essentially, this stems from higher mapping quality (fewer iterations due to 2× better II) and far less nodes to be mapped (i.e., smaller n) in any of the attempts. For example, both REGIMap and MEMMap load the live-in data from the memory [10, 13, 18]. Plus, REGIMap cannot spill the data and requires many routing operations, when constrained by the availability of few local registers. Similarly, MEMMap often routes data via memory, even if enough registers are available. Thus, they have to map 1.5×-2× nodes than RAMP. 7 SUMMARY This paper presents challenges with existing mapping techniques, which are unable to make good use of the routing resources. They first schedule the DDG and then attempt the P&R; routing is internal to P&R and is carried out in an ad-hoc manner. As a result, the operations may not be mapped due to resource constraints. This paper introduces RAMP which models various routing strategies explicitly and flexibly explore various ways to map the data dependencies while exploiting the CGRA resources. RAMP accelerates the top performance-critical loops of MiBench by 23× over a sequential execution, and by 2.13× over state-of-the-art techniques. ACKNOWLEDGMENTS This work was partially supported by funding from NSF grants CNS 1525855 and CCF 172346 - NSF/Intel joint research center for Computer Assisted Programming for Heterogeneous Architectures (CAPA). REFERENCES Publisher Copyright: © 2018 Association for Computing Machinery.

PY - 2018/6/24

Y1 - 2018/6/24

N2 - Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.

AB - Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.

UR - http://www.scopus.com/inward/record.url?scp=85053670482&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053670482&partnerID=8YFLogxK

U2 - 10.1145/3195970.3196101

DO - 10.1145/3195970.3196101

M3 - Conference contribution

AN - SCOPUS:85053670482

SN - 9781450357005

T3 - Proceedings - Design Automation Conference

BT - Proceedings of the 55th Annual Design Automation Conference, DAC 2018

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 24 June 2018 through 29 June 2018

ER -

RAMP: Resource-aware mapping for CGRAs

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this