A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

Subhankar Pal; Dong Hyeon Park; Siying Feng; Paul Gao; Jielun Tan; Austin Rovinski; Shaolin Xie; Chun Zhao; Aporva Amarnath; Timothy Wesley; Jonathan Beaumont; Kuan Yu Chen; Chaitali Chakrabarti; Michael Taylor; Trevor Mudge; David Blaauw; Hun Seok Kim; Ronald Dreslinski

doi:10.23919/VLSIC.2019.8778147

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

Subhankar Pal, Dong Hyeon Park, Siying Feng, Paul Gao, Jielun Tan, Austin Rovinski, Shaolin Xie, Chun Zhao, Aporva Amarnath, Timothy Wesley, Jonathan Beaumont, Kuan Yu Chen, Chaitali Chakrabarti, Michael Taylor, Trevor Mudge, David Blaauw, Hun Seok Kim, Ronald Dreslinski

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

Original language	English (US)
Title of host publication	2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	C150-C151
ISBN (Electronic)	9784863487185
DOIs	https://doi.org/10.23919/VLSIC.2019.8778147
State	Published - Jun 2019
Event	33rd Symposium on VLSI Circuits, VLSI Circuits 2019 - Kyoto, Japan Duration: Jun 9 2019 → Jun 14 2019

Publication series

Name	IEEE Symposium on VLSI Circuits, Digest of Technical Papers
Volume	2019-June

Conference

Conference	33rd Symposium on VLSI Circuits, VLSI Circuits 2019
Country/Territory	Japan
City	Kyoto
Period	6/9/19 → 6/14/19

Keywords

Sparse matrix multiplier
decoupled access-execution
reconfigurablility and accelerator
synthesizable crossbar

ASJC Scopus subject areas

Electronic, Optical and Magnetic Materials
Electrical and Electronic Engineering

Access to Document

10.23919/VLSIC.2019.8778147

Cite this

Pal, S., Park, D. H., Feng, S., Gao, P., Tan, J., Rovinski, A., Xie, S., Zhao, C., Amarnath, A., Wesley, T., Beaumont, J., Chen, K. Y., Chakrabarti, C., Taylor, M., Mudge, T., Blaauw, D., Kim, H. S., & Dreslinski, R. (2019). A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. In 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers (pp. C150-C151). Article 8778147 (IEEE Symposium on VLSI Circuits, Digest of Technical Papers; Vol. 2019-June). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/VLSIC.2019.8778147

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. / Pal, Subhankar; Park, Dong Hyeon; Feng, Siying et al.
2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc., 2019. p. C150-C151 8778147 (IEEE Symposium on VLSI Circuits, Digest of Technical Papers; Vol. 2019-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Pal, S, Park, DH, Feng, S, Gao, P, Tan, J, Rovinski, A, Xie, S, Zhao, C, Amarnath, A, Wesley, T, Beaumont, J, Chen, KY, Chakrabarti, C, Taylor, M, Mudge, T, Blaauw, D, Kim, HS & Dreslinski, R 2019, A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. in 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers., 8778147, IEEE Symposium on VLSI Circuits, Digest of Technical Papers, vol. 2019-June, Institute of Electrical and Electronics Engineers Inc., pp. C150-C151, 33rd Symposium on VLSI Circuits, VLSI Circuits 2019, Kyoto, Japan, 6/9/19. https://doi.org/10.23919/VLSIC.2019.8778147

Pal S, Park DH, Feng S, Gao P, Tan J, Rovinski A et al. A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. In 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc. 2019. p. C150-C151. 8778147. (IEEE Symposium on VLSI Circuits, Digest of Technical Papers). doi: 10.23919/VLSIC.2019.8778147

Pal, Subhankar ; Park, Dong Hyeon ; Feng, Siying et al. / A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm. 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers. Institute of Electrical and Electronics Engineers Inc., 2019. pp. C150-C151 (IEEE Symposium on VLSI Circuits, Digest of Technical Papers).

@inproceedings{195c9bf2801c42babd8fc4448bf53acd,

title = "A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm",

abstract = "A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.",

keywords = "Sparse matrix multiplier, decoupled access-execution, reconfigurablility and accelerator, synthesizable crossbar",

author = "Subhankar Pal and Park, {Dong Hyeon} and Siying Feng and Paul Gao and Jielun Tan and Austin Rovinski and Shaolin Xie and Chun Zhao and Aporva Amarnath and Timothy Wesley and Jonathan Beaumont and Chen, {Kuan Yu} and Chaitali Chakrabarti and Michael Taylor and Trevor Mudge and David Blaauw and Kim, {Hun Seok} and Ronald Dreslinski",

note = "Publisher Copyright: {\textcopyright} 2019 JSAP.; 33rd Symposium on VLSI Circuits, VLSI Circuits 2019 ; Conference date: 09-06-2019 Through 14-06-2019",

year = "2019",

month = jun,

doi = "10.23919/VLSIC.2019.8778147",

language = "English (US)",

series = "IEEE Symposium on VLSI Circuits, Digest of Technical Papers",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "C150--C151",

booktitle = "2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers",

}

TY - GEN

T1 - A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

AU - Pal, Subhankar

AU - Park, Dong Hyeon

AU - Feng, Siying

AU - Gao, Paul

AU - Tan, Jielun

AU - Rovinski, Austin

AU - Xie, Shaolin

AU - Zhao, Chun

AU - Amarnath, Aporva

AU - Wesley, Timothy

AU - Beaumont, Jonathan

AU - Chen, Kuan Yu

AU - Chakrabarti, Chaitali

AU - Taylor, Michael

AU - Mudge, Trevor

AU - Blaauw, David

AU - Kim, Hun Seok

AU - Dreslinski, Ronald

PY - 2019/6

Y1 - 2019/6

N2 - A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

AB - A Sparse Matrix-Matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and a reconfigurable memory hierarchy is fabricated in 40 nm CMOS. On-chip memories are reconfigured as scratchpad or cache and interconnected with synthesizable coalescing crossbars for efficient memory access in each phase of the algorithm. The 2.0 mm × 2.6 mm chip exhibits 12.6 × (8.4×) energy efficiency gain, 11.7× (77.6×) off-chip bandwidth efficiency gain and17.1× (36.9×) compute density gain against a high-end CPU (GPU) across a diverse set of synthetic and real-world power-law graph based sparse matrices.

KW - Sparse matrix multiplier

KW - decoupled access-execution

KW - reconfigurablility and accelerator

KW - synthesizable crossbar

UR - http://www.scopus.com/inward/record.url?scp=85073915110&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073915110&partnerID=8YFLogxK

U2 - 10.23919/VLSIC.2019.8778147

DO - 10.23919/VLSIC.2019.8778147

M3 - Conference contribution

AN - SCOPUS:85073915110

T3 - IEEE Symposium on VLSI Circuits, Digest of Technical Papers

SP - C150-C151

BT - 2019 Symposium on VLSI Circuits, VLSI Circuits 2019 - Digest of Technical Papers

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 33rd Symposium on VLSI Circuits, VLSI Circuits 2019

Y2 - 9 June 2019 through 14 June 2019

ER -

A 7.3 M Output Non-Zeros/J Sparse Matrix-Matrix Multiplication Accelerator using Memory Reconfiguration in 40 nm

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this