RAMP: Resource-aware mapping for CGRAs

Shail Dave, Mahesh Balasubramanian, Aviral Shrivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.

Original languageEnglish (US)
Title of host publicationProceedings of the 55th Annual Design Automation Conference, DAC 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
VolumePart F137710
ISBN (Print)9781450357005
DOIs
StatePublished - Jun 24 2018
Event55th Annual Design Automation Conference, DAC 2018 - San Francisco, United States
Duration: Jun 24 2018Jun 29 2018

Other

Other55th Annual Design Automation Conference, DAC 2018
CountryUnited States
CitySan Francisco
Period6/24/186/29/18

Fingerprint

Routing
Resources
Scheduling
Accelerate
Rescheduling
Data Dependency
Compilation
Graph in graph theory
Compiler
Data storage equipment
Speedup
Benchmark
Configuration

ASJC Scopus subject areas

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modeling and Simulation

Cite this

Dave, S., Balasubramanian, M., & Shrivastava, A. (2018). RAMP: Resource-aware mapping for CGRAs. In Proceedings of the 55th Annual Design Automation Conference, DAC 2018 (Vol. Part F137710). [a127] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3195970.3196101

RAMP : Resource-aware mapping for CGRAs. / Dave, Shail; Balasubramanian, Mahesh; Shrivastava, Aviral.

Proceedings of the 55th Annual Design Automation Conference, DAC 2018. Vol. Part F137710 Institute of Electrical and Electronics Engineers Inc., 2018. a127.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dave, S, Balasubramanian, M & Shrivastava, A 2018, RAMP: Resource-aware mapping for CGRAs. in Proceedings of the 55th Annual Design Automation Conference, DAC 2018. vol. Part F137710, a127, Institute of Electrical and Electronics Engineers Inc., 55th Annual Design Automation Conference, DAC 2018, San Francisco, United States, 6/24/18. https://doi.org/10.1145/3195970.3196101
Dave S, Balasubramanian M, Shrivastava A. RAMP: Resource-aware mapping for CGRAs. In Proceedings of the 55th Annual Design Automation Conference, DAC 2018. Vol. Part F137710. Institute of Electrical and Electronics Engineers Inc. 2018. a127 https://doi.org/10.1145/3195970.3196101
Dave, Shail ; Balasubramanian, Mahesh ; Shrivastava, Aviral. / RAMP : Resource-aware mapping for CGRAs. Proceedings of the 55th Annual Design Automation Conference, DAC 2018. Vol. Part F137710 Institute of Electrical and Electronics Engineers Inc., 2018.
@inproceedings{71ef0c77afe240c897bd54e143a98f03,
title = "RAMP: Resource-aware mapping for CGRAs",
abstract = "Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.",
author = "Shail Dave and Mahesh Balasubramanian and Aviral Shrivastava",
year = "2018",
month = "6",
day = "24",
doi = "10.1145/3195970.3196101",
language = "English (US)",
isbn = "9781450357005",
volume = "Part F137710",
booktitle = "Proceedings of the 55th Annual Design Automation Conference, DAC 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - RAMP

T2 - Resource-aware mapping for CGRAs

AU - Dave, Shail

AU - Balasubramanian, Mahesh

AU - Shrivastava, Aviral

PY - 2018/6/24

Y1 - 2018/6/24

N2 - Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.

AB - Coarse-grained reconfigurable array (CGRA) is a promising solution that can accelerate even non-parallel loops. Acceleration achieved through CGRAs critically depends on the goodness of mapping (of loop operations onto the PEs of CGRA), and in particular, the compiler's ability to route the dependencies among operations. Previous works have explored several mechanisms to route data dependencies, including, routing through other PEs, registers, memory, and even re-computation. All these routing options change the graph to be mapped onto PEs (often by adding new operations), and without re-scheduling, it may be impossible to map the new graph. However, existing techniques explore these routing options inside the Place and Route (P&R) phase of the compilation process, which is performed after the scheduling step. As a result, they either may not achieve the mapping or obtain poor results. Our method RAMP, explicitly and intelligently explores the various routing options, before the scheduling step, and makes improve the mapping-ability and mapping quality. Evaluating top performance-critical loops of MiBench benchmarks over 12 architectural configurations, we find that RAMP is able to accelerate loops by 23× over sequential execution, achieving a geomean speedup of 2.13× over state-of-the-art.

UR - http://www.scopus.com/inward/record.url?scp=85053670482&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85053670482&partnerID=8YFLogxK

U2 - 10.1145/3195970.3196101

DO - 10.1145/3195970.3196101

M3 - Conference contribution

AN - SCOPUS:85053670482

SN - 9781450357005

VL - Part F137710

BT - Proceedings of the 55th Annual Design Automation Conference, DAC 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -