A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

Subhankar Pal, Dong Hyeon Park, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan Yu Chen, Chaitali Chakrabarti, Michael Taylor, Trevor Mudge, David Blaauw, Hun Seok Kim, Ronald Dreslinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

Original languageEnglish (US)
Title of host publication2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers
PublisherInstitute of Electrical and Electronics Engineers Inc.
PagesC150-C151
ISBN (Electronic)9784863487185
DOIs
StatePublished - Jun 2019
Event33rd Symposium on VLSI Circuits, VLSI Circuits 2019 - Kyoto, Japan
Duration: Jun 9 2019Jun 14 2019

Publication series

NameIEEE Symposium on VLSI Circuits, Digest of Technical Papers
Volume2019-June

Conference

Conference33rd Symposium on VLSI Circuits, VLSI Circuits 2019
CountryJapan
CityKyoto
Period6/9/196/14/19

Fingerprint

Particle accelerators
Data storage equipment
Program processors
Energy efficiency
Bandwidth
Graphics processing unit

Keywords

  • decoupled access-execution
  • reconfigurablility and accelerator
  • Sparse matrix multiplier
  • synthesizable crossbar

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Electrical and Electronic Engineering

Cite this

Pal, S., Park, D. H., Feng, S., Gao, P., Tan, J., Rovinski, A., ... Dreslinski, R. (2019). A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. In 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers (pp. C150-C151). [8778147] (IEEE Symposium on VLSI Circuits, Digest of Technical Papers; Vol. 2019-June). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/VLSIC.2019.8778147

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. / Pal, Subhankar; Park, Dong Hyeon; Feng, Siying; Gao, Paul; Tan, Jielun; Rovinski, Austin; Xie, Shaolin; Zhao, Chun; Amarnath, Aporva; Wesley, Timothy; Beaumont, Jonathan; Chen, Kuan Yu; Chakrabarti, Chaitali; Taylor, Michael; Mudge, Trevor; Blaauw, David; Kim, Hun Seok; Dreslinski, Ronald.

2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc., 2019. p. C150-C151 8778147 (IEEE Symposium on VLSI Circuits, Digest of Technical Papers; Vol. 2019-June).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pal, S, Park, DH, Feng, S, Gao, P, Tan, J, Rovinski, A, Xie, S, Zhao, C, Amarnath, A, Wesley, T, Beaumont, J, Chen, KY, Chakrabarti, C, Taylor, M, Mudge, T, Blaauw, D, Kim, HS & Dreslinski, R 2019, A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. in 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers., 8778147, IEEE Symposium on VLSI Circuits, Digest of Technical Papers, vol. 2019-June, Institute of Electrical and Electronics Engineers Inc., pp. C150-C151, 33rd Symposium on VLSI Circuits, VLSI Circuits 2019, Kyoto, Japan, 6/9/19. https://doi.org/10.23919/VLSIC.2019.8778147
Pal S, Park DH, Feng S, Gao P, Tan J, Rovinski A et al. A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. In 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc. 2019. p. C150-C151. 8778147. (IEEE Symposium on VLSI Circuits, Digest of Technical Papers). https://doi.org/10.23919/VLSIC.2019.8778147
Pal, Subhankar ; Park, Dong Hyeon ; Feng, Siying ; Gao, Paul ; Tan, Jielun ; Rovinski, Austin ; Xie, Shaolin ; Zhao, Chun ; Amarnath, Aporva ; Wesley, Timothy ; Beaumont, Jonathan ; Chen, Kuan Yu ; Chakrabarti, Chaitali ; Taylor, Michael ; Mudge, Trevor ; Blaauw, David ; Kim, Hun Seok ; Dreslinski, Ronald. / A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc., 2019. pp. C150-C151 (IEEE Symposium on VLSI Circuits, Digest of Technical Papers).
@inproceedings{195c9bf2801c42babd8fc4448bf53acd,
title = "A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm",
abstract = "A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.",
keywords = "decoupled access-execution, reconfigurablility and accelerator, Sparse matrix multiplier, synthesizable crossbar",
author = "Subhankar Pal and Park, {Dong Hyeon} and Siying Feng and Paul Gao and Jielun Tan and Austin Rovinski and Shaolin Xie and Chun Zhao and Aporva Amarnath and Timothy Wesley and Jonathan Beaumont and Chen, {Kuan Yu} and Chaitali Chakrabarti and Michael Taylor and Trevor Mudge and David Blaauw and Kim, {Hun Seok} and Ronald Dreslinski",
year = "2019",
month = "6",
doi = "10.23919/VLSIC.2019.8778147",
language = "English (US)",
series = "IEEE Symposium on VLSI Circuits, Digest of Technical Papers",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "C150--C151",
booktitle = "2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers",

}

TY - GEN

T1 - A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

AU - Pal, Subhankar

AU - Park, Dong Hyeon

AU - Feng, Siying

AU - Gao, Paul

AU - Tan, Jielun

AU - Rovinski, Austin

AU - Xie, Shaolin

AU - Zhao, Chun

AU - Amarnath, Aporva

AU - Wesley, Timothy

AU - Beaumont, Jonathan

AU - Chen, Kuan Yu

AU - Chakrabarti, Chaitali

AU - Taylor, Michael

AU - Mudge, Trevor

AU - Blaauw, David

AU - Kim, Hun Seok

AU - Dreslinski, Ronald

PY - 2019/6

Y1 - 2019/6

N2 - A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

AB - A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

KW - decoupled access-execution

KW - reconfigurablility and accelerator

KW - Sparse matrix multiplier

KW - synthesizable crossbar

UR - http://www.scopus.com/inward/record.url?scp=85073915110&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073915110&partnerID=8YFLogxK

U2 - 10.23919/VLSIC.2019.8778147

DO - 10.23919/VLSIC.2019.8778147

M3 - Conference contribution

AN - SCOPUS:85073915110

T3 - IEEE Symposium on VLSI Circuits, Digest of Technical Papers

SP - C150-C151

BT - 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers

PB - Institute of Electrical and Electronics Engineers Inc.

ER -