TY - GEN
T1 - OuterSPACE
T2 - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018
AU - Pal, Subhankar
AU - Beaumont, Jonathan
AU - Park, Dong Hyeon
AU - Amarnath, Aporva
AU - Feng, Siying
AU - Chakrabarti, Chaitali
AU - Kim, Hun Seok
AU - Blaauw, David
AU - Mudge, Trevor
AU - Dreslinski, Ronald
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/3/27
Y1 - 2018/3/27
N2 - Sparse matrices are widely used in graph and data analytics, machine learning, engineering and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator targeted at applications that involve large sparse matrices. OuterSPACE is a highly-scalable, energy-efficient, reconfigurable design, consisting of massively parallel Single Program, Multiple Data (SPMD)-style processing units, distributed memories, high-speed crossbars and High Bandwidth Memory (HBM). We identify redundant memory accesses to non-zeros as a key bottleneck in traditional sparse matrix-matrix multiplication algorithms. To ameliorate this, we implement an outer product based matrix multiplication technique that eliminates redundant accesses by decoupling multiplication from accumulation. We demonstrate that traditional architectures, due to limitations in their memory hierarchies and ability to harness parallelism in the algorithm, are unable to take advantage of this reduction without incurring significant overheads. OuterSPACE is designed to specifically overcome these challenges. We simulate the key components of our architecture using gem5 on a diverse set of matrices from the University of Florida's SuiteSparse collection and the Stanford Network Analysis Project and show a mean speedup of 7.9× over Intel Math Kernel Library on a Xeon CPU, 13.0× against cuSPARSE and 14.0× against CUSP when run on an NVIDIA K40 GPU, while achieving an average throughput of 2.9 GFLOPS within a 24 W power budget in an area of 87 mm2.
AB - Sparse matrices are widely used in graph and data analytics, machine learning, engineering and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator targeted at applications that involve large sparse matrices. OuterSPACE is a highly-scalable, energy-efficient, reconfigurable design, consisting of massively parallel Single Program, Multiple Data (SPMD)-style processing units, distributed memories, high-speed crossbars and High Bandwidth Memory (HBM). We identify redundant memory accesses to non-zeros as a key bottleneck in traditional sparse matrix-matrix multiplication algorithms. To ameliorate this, we implement an outer product based matrix multiplication technique that eliminates redundant accesses by decoupling multiplication from accumulation. We demonstrate that traditional architectures, due to limitations in their memory hierarchies and ability to harness parallelism in the algorithm, are unable to take advantage of this reduction without incurring significant overheads. OuterSPACE is designed to specifically overcome these challenges. We simulate the key components of our architecture using gem5 on a diverse set of matrices from the University of Florida's SuiteSparse collection and the Stanford Network Analysis Project and show a mean speedup of 7.9× over Intel Math Kernel Library on a Xeon CPU, 13.0× against cuSPARSE and 14.0× against CUSP when run on an NVIDIA K40 GPU, while achieving an average throughput of 2.9 GFLOPS within a 24 W power budget in an area of 87 mm2.
KW - Application specific hardware
KW - Hardware accelerators
KW - Hardware software co design
KW - Parallel computer architecture
KW - Sparse matrix processing
UR - http://www.scopus.com/inward/record.url?scp=85046825176&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85046825176&partnerID=8YFLogxK
U2 - 10.1109/HPCA.2018.00067
DO - 10.1109/HPCA.2018.00067
M3 - Conference contribution
AN - SCOPUS:85046825176
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 724
EP - 736
BT - Proceedings - 24th IEEE International Symposium on High Performance Computer Architecture, HPCA 2018
PB - IEEE Computer Society
Y2 - 24 February 2018 through 28 February 2018
ER -