TY - JOUR
T1 - Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models
T2 - A Survey and Insights
AU - Dave, Shail
AU - Baghdadi, Riyadh
AU - Nowatzki, Tony
AU - Avancha, Sasikanth
AU - Shrivastava, Aviral
AU - Li, Baoxin
N1 - Funding Information:
Manuscript received June 1, 2020; revised March 7, 2021 and July 5, 2021; accepted July 15, 2021. Date of publication August 5, 2021; date of current version September 20, 2021. This work was supported in part by NSF under Grant CCF 1723476—NSF/Intel Joint Research Center for Computer Assisted Programming for Heterogeneous Architectures (CAPA). (Corresponding author: Shail Dave.) Shail Dave, Aviral Shrivastava, and Baoxin Li are with the School of Computing Informatics and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail: shail.dave@asu.edu; aviral.shrivastava@asu.edu; baoxin.li@asu.edu). Riyadh Baghdadi is with the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: baghdadi@mit.edu). Tony Nowatzki is with the School of Computer Science, University of California at Los Angeles, Los Angeles CA 90095 USA (e-mail: tjn@cs.ucla.edu). Sasikanth Avancha is with the Parallel Computing Lab, Intel Labs, Bengaluru 560103, India (e-mail: sasikanth.avancha@intel.com).
Publisher Copyright:
© 1963-2012 IEEE.
PY - 2021/10
Y1 - 2021/10
N2 - Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This article provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support, categorizes different hardware designs and acceleration techniques, analyzes them in terms of hardware and execution costs, analyzes achievable accelerations for recent DNNs, and highlights further opportunities in terms of hardware/software/model codesign optimizations (inter/intramodule). The takeaways from this article include the following: understanding the key challenges in accelerating sparse, irregular shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing tradeoffs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the nonzeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; and understanding recent design trends for efficient accelerations and further opportunities.
AB - Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This article provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support, categorizes different hardware designs and acceleration techniques, analyzes them in terms of hardware and execution costs, analyzes achievable accelerations for recent DNNs, and highlights further opportunities in terms of hardware/software/model codesign optimizations (inter/intramodule). The takeaways from this article include the following: understanding the key challenges in accelerating sparse, irregular shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing tradeoffs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the nonzeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; and understanding recent design trends for efficient accelerations and further opportunities.
KW - Compact models
KW - VLSI
KW - compiler optimizations
KW - dataflow
KW - deep learning
KW - deep neural networks (DNNs)
KW - dimension reduction
KW - energy efficiency
KW - hardware/software/model codesign
KW - machine learning (ML)
KW - pruning
KW - quantization
KW - reconfigurable computing
KW - sparsity
KW - spatial architecture
KW - tensor decomposition
UR - http://www.scopus.com/inward/record.url?scp=85112154428&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85112154428&partnerID=8YFLogxK
U2 - 10.1109/JPROC.2021.3098483
DO - 10.1109/JPROC.2021.3098483
M3 - Review article
AN - SCOPUS:85112154428
VL - 109
SP - 1706
EP - 1752
JO - Proceedings of the Institute of Radio Engineers
JF - Proceedings of the Institute of Radio Engineers
SN - 0018-9219
IS - 10
ER -