A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

Subhankar Pal, Dong Hyeon Park, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan Yu Chen, Chaitali Chakrabarti, Michael Taylor, Trevor Mudge, David Blaauw, Hun Seok Kim, Ronald Dreslinski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm×2.6 mm chip exhibits 12.6× (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and 17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

Original languageEnglish (US)
Title of host publication2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers
PublisherInstitute of Electrical and Electronics Engineers Inc.
PagesC150-C151
ISBN (Electronic)9784863487178
DOIs
StatePublished - Jun 1 2019
Event39th Symposium on VLSI Technology, VLSI Technology 2019 - Kyoto, Japan
Duration: Jun 9 2019Jun 14 2019

Publication series

NameDigest of Technical Papers - Symposium on VLSI Technology
Volume2019-June
ISSN (Print)0743-1562

Conference

Conference39th Symposium on VLSI Technology, VLSI Technology 2019
CountryJapan
CityKyoto
Period6/9/196/14/19

Fingerprint

Particle accelerators
Data storage equipment
Program processors
Energy efficiency
Bandwidth
Graphics processing unit

Keywords

  • decoupled access-execution
  • reconfigurablility and accelerator
  • Sparse matrix multiplier
  • synthesizable crossbar

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

Pal, S., Park, D. H., Feng, S., Gao, P., Tan, J., Rovinski, A., ... Dreslinski, R. (2019). A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. In 2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers (pp. C150-C151). [8776507] (Digest of Technical Papers - Symposium on VLSI Technology; Vol. 2019-June). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/VLSIT.2019.8776507

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. / Pal, Subhankar; Park, Dong Hyeon; Feng, Siying; Gao, Paul; Tan, Jielun; Rovinski, Austin; Xie, Shaolin; Zhao, Chun; Amarnath, Aporva; Wesley, Timothy; Beaumont, Jonathan; Chen, Kuan Yu; Chakrabarti, Chaitali; Taylor, Michael; Mudge, Trevor; Blaauw, David; Kim, Hun Seok; Dreslinski, Ronald.

2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc., 2019. p. C150-C151 8776507 (Digest of Technical Papers - Symposium on VLSI Technology; Vol. 2019-June).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pal, S, Park, DH, Feng, S, Gao, P, Tan, J, Rovinski, A, Xie, S, Zhao, C, Amarnath, A, Wesley, T, Beaumont, J, Chen, KY, Chakrabarti, C, Taylor, M, Mudge, T, Blaauw, D, Kim, HS & Dreslinski, R 2019, A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. in 2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers., 8776507, Digest of Technical Papers - Symposium on VLSI Technology, vol. 2019-June, Institute of Electrical and Electronics Engineers Inc., pp. C150-C151, 39th Symposium on VLSI Technology, VLSI Technology 2019, Kyoto, Japan, 6/9/19. https://doi.org/10.23919/VLSIT.2019.8776507
Pal S, Park DH, Feng S, Gao P, Tan J, Rovinski A et al. A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. In 2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc. 2019. p. C150-C151. 8776507. (Digest of Technical Papers - Symposium on VLSI Technology). https://doi.org/10.23919/VLSIT.2019.8776507
Pal, Subhankar ; Park, Dong Hyeon ; Feng, Siying ; Gao, Paul ; Tan, Jielun ; Rovinski, Austin ; Xie, Shaolin ; Zhao, Chun ; Amarnath, Aporva ; Wesley, Timothy ; Beaumont, Jonathan ; Chen, Kuan Yu ; Chakrabarti, Chaitali ; Taylor, Michael ; Mudge, Trevor ; Blaauw, David ; Kim, Hun Seok ; Dreslinski, Ronald. / A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. 2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc., 2019. pp. C150-C151 (Digest of Technical Papers - Symposium on VLSI Technology).
@inproceedings{b8c8b1cb4d48461b950c90efc8ccb5e0,
title = "A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm",
abstract = "A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm×2.6 mm chip exhibits 12.6× (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and 17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.",
keywords = "decoupled access-execution, reconfigurablility and accelerator, Sparse matrix multiplier, synthesizable crossbar",
author = "Subhankar Pal and Park, {Dong Hyeon} and Siying Feng and Paul Gao and Jielun Tan and Austin Rovinski and Shaolin Xie and Chun Zhao and Aporva Amarnath and Timothy Wesley and Jonathan Beaumont and Chen, {Kuan Yu} and Chaitali Chakrabarti and Michael Taylor and Trevor Mudge and David Blaauw and Kim, {Hun Seok} and Ronald Dreslinski",
year = "2019",
month = "6",
day = "1",
doi = "10.23919/VLSIT.2019.8776507",
language = "English (US)",
series = "Digest of Technical Papers - Symposium on VLSI Technology",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "C150--C151",
booktitle = "2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers",

}

TY - GEN

T1 - A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

AU - Pal, Subhankar

AU - Park, Dong Hyeon

AU - Feng, Siying

AU - Gao, Paul

AU - Tan, Jielun

AU - Rovinski, Austin

AU - Xie, Shaolin

AU - Zhao, Chun

AU - Amarnath, Aporva

AU - Wesley, Timothy

AU - Beaumont, Jonathan

AU - Chen, Kuan Yu

AU - Chakrabarti, Chaitali

AU - Taylor, Michael

AU - Mudge, Trevor

AU - Blaauw, David

AU - Kim, Hun Seok

AU - Dreslinski, Ronald

PY - 2019/6/1

Y1 - 2019/6/1

N2 - A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm×2.6 mm chip exhibits 12.6× (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and 17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

AB - A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm×2.6 mm chip exhibits 12.6× (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and 17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

KW - decoupled access-execution

KW - reconfigurablility and accelerator

KW - Sparse matrix multiplier

KW - synthesizable crossbar

UR - http://www.scopus.com/inward/record.url?scp=85070259707&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070259707&partnerID=8YFLogxK

U2 - 10.23919/VLSIT.2019.8776507

DO - 10.23919/VLSIT.2019.8776507

M3 - Conference contribution

AN - SCOPUS:85070259707

T3 - Digest of Technical Papers - Symposium on VLSI Technology

SP - C150-C151

BT - 2019 Symposium on VLSI Technology, VLSI Technology 2019 - Digest of Technical Papers

PB - Institute of Electrical and Electronics Engineers Inc.

ER -