Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Shail Dave; Riyadh Baghdadi; Tony Nowatzki; Sasikanth Avancha; Aviral Shrivastava; Baoxin Li

doi:10.1109/JPROC.2021.3098483

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, Baoxin Li

Research output: Contribution to journal › Review article › peer-review

35 Scopus citations

Abstract

Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This article provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support, categorizes different hardware designs and acceleration techniques, analyzes them in terms of hardware and execution costs, analyzes achievable accelerations for recent DNNs, and highlights further opportunities in terms of hardware/software/model codesign optimizations (inter/intramodule). The takeaways from this article include the following: understanding the key challenges in accelerating sparse, irregular shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing tradeoffs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the nonzeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; and understanding recent design trends for efficient accelerations and further opportunities.

Original language	English (US)
Pages (from-to)	1706-1752
Number of pages	47
Journal	Proceedings of the IEEE
Volume	109
Issue number	10
DOIs	https://doi.org/10.1109/JPROC.2021.3098483
State	Published - Oct 2021

Keywords

Compact models
VLSI
compiler optimizations
dataflow
deep learning
deep neural networks (DNNs)
dimension reduction
energy efficiency
hardware/software/model codesign
machine learning (ML)
pruning
quantization
reconfigurable computing
sparsity
spatial architecture
tensor decomposition

ASJC Scopus subject areas

General Computer Science
Electrical and Electronic Engineering

Access to Document

10.1109/JPROC.2021.3098483

Cite this

@article{a98eccb8d767436aac47849417894813,

title = "Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights",

abstract = "Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This article provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support, categorizes different hardware designs and acceleration techniques, analyzes them in terms of hardware and execution costs, analyzes achievable accelerations for recent DNNs, and highlights further opportunities in terms of hardware/software/model codesign optimizations (inter/intramodule). The takeaways from this article include the following: understanding the key challenges in accelerating sparse, irregular shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing tradeoffs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the nonzeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; and understanding recent design trends for efficient accelerations and further opportunities.",

keywords = "Compact models, VLSI, compiler optimizations, dataflow, deep learning, deep neural networks (DNNs), dimension reduction, energy efficiency, hardware/software/model codesign, machine learning (ML), pruning, quantization, reconfigurable computing, sparsity, spatial architecture, tensor decomposition",

author = "Shail Dave and Riyadh Baghdadi and Tony Nowatzki and Sasikanth Avancha and Aviral Shrivastava and Baoxin Li",

note = "Publisher Copyright: {\textcopyright} 1963-2012 IEEE.",

year = "2021",

month = oct,

doi = "10.1109/JPROC.2021.3098483",

language = "English (US)",

volume = "109",

pages = "1706--1752",

journal = "Proceedings of the IEEE",

issn = "0018-9219",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "10",

}

TY - JOUR

T1 - Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models

T2 - A Survey and Insights

AU - Dave, Shail

AU - Baghdadi, Riyadh

AU - Nowatzki, Tony

AU - Avancha, Sasikanth

AU - Shrivastava, Aviral

AU - Li, Baoxin

PY - 2021/10

Y1 - 2021/10

N2 - Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This article provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support, categorizes different hardware designs and acceleration techniques, analyzes them in terms of hardware and execution costs, analyzes achievable accelerations for recent DNNs, and highlights further opportunities in terms of hardware/software/model codesign optimizations (inter/intramodule). The takeaways from this article include the following: understanding the key challenges in accelerating sparse, irregular shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing tradeoffs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the nonzeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; and understanding recent design trends for efficient accelerations and further opportunities.

AB - Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these overparameterized models are compressed by leveraging sparsity, size reduction, and quantization of tensors. Unstructured sparsity and tensors with varying dimensions yield irregular computation, communication, and memory access patterns; processing them on hardware accelerators in a conventional manner does not inherently leverage acceleration opportunities. This article provides a comprehensive survey on the efficient execution of sparse and irregular tensor computations of ML models on hardware accelerators. In particular, it discusses enhancement modules in the architecture design and the software support, categorizes different hardware designs and acceleration techniques, analyzes them in terms of hardware and execution costs, analyzes achievable accelerations for recent DNNs, and highlights further opportunities in terms of hardware/software/model codesign optimizations (inter/intramodule). The takeaways from this article include the following: understanding the key challenges in accelerating sparse, irregular shaped, and quantized tensors; understanding enhancements in accelerator systems for supporting their efficient computations; analyzing tradeoffs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the nonzeros; understanding how structured sparsity can improve storage efficiency and balance computations; understanding how to compile and map models with sparse tensors on the accelerators; and understanding recent design trends for efficient accelerations and further opportunities.

KW - Compact models

KW - VLSI

KW - compiler optimizations

KW - dataflow

KW - deep learning

KW - deep neural networks (DNNs)

KW - dimension reduction

KW - energy efficiency

KW - hardware/software/model codesign

KW - machine learning (ML)

KW - pruning

KW - quantization

KW - reconfigurable computing

KW - sparsity

KW - spatial architecture

KW - tensor decomposition

UR - http://www.scopus.com/inward/record.url?scp=85112154428&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85112154428&partnerID=8YFLogxK

U2 - 10.1109/JPROC.2021.3098483

DO - 10.1109/JPROC.2021.3098483

M3 - Review article

AN - SCOPUS:85112154428

SN - 0018-9219

VL - 109

SP - 1706

EP - 1752

JO - Proceedings of the IEEE

JF - Proceedings of the IEEE

IS - 10

ER -

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this