Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

Subhankar Pal, Siying Feng, Dong Hyeon Park, Sung Kim, Aporva Amarnath, Chi Sheng Yang, Xin He, Jonathan Beaumont, Kyle May, Yan Xiong, Kuba Kaszyk, John Magnus Morton, Jiawen Sun, Michael O'Boyle, Murray Cole, Chaitali Chakrabarti, David Blaauw, Hun Seok Kim, Trevor Mudge, Ronald Dreslinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build hardware for emerging applications thatmeet power and performance targets, while remaining flexible andprogrammable for end users. This is particularly true for domainsthat have frequently changing algorithms and applications involving mixed sparse/dense data structures, such as those in machinelearning and graph analytics. To overcome this, we present a flexibleaccelerator called Transmuter, in a novel effort to bridge the gap between General-Purpose Processors (GPPs) and Application-SpecificIntegrated Circuits (ASICs). Transmuter adapts to changing kernelcharacteristics, such as data reuse and control divergence, throughthe ability to reconfigure the on-chip memory type, resource sharingand dataflow at run-time within a short latency. This is facilitatedby a fabric of light-weight cores connected to a network of reconfigurable caches and crossbars. Transmuter addresses a rapidlygrowing set of algorithms exhibiting dynamic data movement patterns, irregularity, and sparsity, while delivering GPU-like efficiencies for traditional dense applications. Finally, in order to supportprogrammability and ease-of-adoption, we prototype a softwarestack composed of low-level runtime routines, and a high-levellanguage library called TransPy, that cater to expert programmersand end-users, respectively.Our evaluations with Transmuter demonstrate average throughput (energy-efficiency) improvements of 5.0× (18.4×) and 4.2× (4.0×)over a high-end CPU and GPU, respectively, across a diverse set ofkernels predominant in graph analytics, scientific computing andmachine learning. Transmuter achieves energy-efficiency gains averaging 3.4× and 2.0× over prior FPGA and CGRA implementationsof the same kernels, while remaining on average within 9.3× ofstate-of-the-art ASICs.

Original languageEnglish (US)
Title of host publicationPACT 2020 - Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages175-190
Number of pages16
ISBN (Electronic)9781450380751
DOIs
StatePublished - Sep 30 2020
Event2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020 - Virtual, Online, United States
Duration: Oct 3 2020Oct 7 2020

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
ISSN (Print)1089-795X

Conference

Conference2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020
Country/TerritoryUnited States
CityVirtual, Online
Period10/3/2010/7/20

Keywords

  • Dataflow reconfiguration
  • General-purpose acceleration
  • Hardware acceleration
  • Memory reconfiguration
  • Reconfigurable architectures

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration'. Together they form a unique fingerprint.

Cite this