A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix-Matrix Multiplication Accelerator

Dong Hyeon Park, Subhankar Pal, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan Yu Chen, Chaitali Chakrabarti, Michael Bedford Taylor, Trevor Mudge, David Blaauw, Hun Seok Kim, Ronald G. Dreslinski

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

A sparse matrix-matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40-nm CMOS. The compute fabric consists of dedicated floating-point multiplication units, and general-purpose Arm Cortex-M0 and Cortex-M4 cores. The on-chip memory reconfigures scratchpad or cache, depending on the phase of the algorithm. The memory and compute units are interconnected with synthesizable coalescing crossbars for efficient memory access. The 2.0-mm \times 2.6-mm chip exhibits 12.6 \times (8.4 \times ) energy efficiency gain, 11.7 \times (77.6 \times ) off-chip bandwidth efficiency gain, and 17.1 \times (36.9 \times ) compute density gain s against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph-based sparse matrices.

Original languageEnglish (US)
Article number8947989
Pages (from-to)933-944
Number of pages12
JournalIEEE Journal of Solid-State Circuits
Volume55
Issue number4
DOIs
StatePublished - Apr 2020

Keywords

  • Decoupled access execution
  • reconfigurablility and accelerator
  • sparse matrix multiplier
  • synthesizable crossbar

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A 7.3 M Output Non-Zeros/J, 11.7 M Output Non-Zeros/GB Reconfigurable Sparse Matrix-Matrix Multiplication Accelerator'. Together they form a unique fingerprint.

Cite this