A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Arnab Neelim Mazumder; Jian Meng; Hasib Al Rashid; Utteja Kallakuri; Xin Zhang; Jae Sun Seo; Tinoosh Mohsenin

doi:10.1109/JETCAS.2021.3129415

A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Arnab Neelim Mazumder, Jian Meng, Hasib Al Rashid, Utteja Kallakuri, Xin Zhang, Jae Sun Seo, Tinoosh Mohsenin

Research output: Contribution to journal › Article › peer-review

20 Scopus citations

Abstract

Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides with the fact that they can provide state-of-the-art inference accuracy for these applications. However, this advantage comes from the high computational complexity of the DNNs in use. Hence, it is becoming increasingly important to scale these DNNs so that they can fit on resource-constrained hardware and edge devices. The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms. To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and finally elaborate on the current hardware approaches towards efficient deployment of the micro-AI models on hardware. The main takeaways for a reader from this article will be understanding of different search spaces to pinpoint the best micro-AI model configuration, ability to interpret different quantization and sparsification techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.

Original language	English (US)
Pages (from-to)	532-547
Number of pages	16
Journal	IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Volume	11
Issue number	4
DOIs	https://doi.org/10.1109/JETCAS.2021.3129415
State	Published - Dec 1 2021

Keywords

Deep neural networks
hardware accelerators
inference engines
model compression
neural architecture search
quantization

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/JETCAS.2021.3129415

Cite this

@article{b4ce034c7a3b4db3a7ba907e4d0adeb7,

title = "A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference",

abstract = "Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides with the fact that they can provide state-of-the-art inference accuracy for these applications. However, this advantage comes from the high computational complexity of the DNNs in use. Hence, it is becoming increasingly important to scale these DNNs so that they can fit on resource-constrained hardware and edge devices. The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms. To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and finally elaborate on the current hardware approaches towards efficient deployment of the micro-AI models on hardware. The main takeaways for a reader from this article will be understanding of different search spaces to pinpoint the best micro-AI model configuration, ability to interpret different quantization and sparsification techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.",

keywords = "Deep neural networks, hardware accelerators, inference engines, model compression, neural architecture search, quantization",

author = "Mazumder, {Arnab Neelim} and Jian Meng and Rashid, {Hasib Al} and Utteja Kallakuri and Xin Zhang and Seo, {Jae Sun} and Tinoosh Mohsenin",

note = "Publisher Copyright: {\textcopyright} 2011 IEEE.",

year = "2021",

month = dec,

day = "1",

doi = "10.1109/JETCAS.2021.3129415",

language = "English (US)",

volume = "11",

pages = "532--547",

journal = "IEEE Journal on Emerging and Selected Topics in Circuits and Systems",

issn = "2156-3357",

publisher = "IEEE Circuits and Systems Society",

number = "4",

}

TY - JOUR

T1 - A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

AU - Mazumder, Arnab Neelim

AU - Meng, Jian

AU - Rashid, Hasib Al

AU - Kallakuri, Utteja

AU - Zhang, Xin

AU - Seo, Jae Sun

AU - Mohsenin, Tinoosh

PY - 2021/12/1

Y1 - 2021/12/1

N2 - Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides with the fact that they can provide state-of-the-art inference accuracy for these applications. However, this advantage comes from the high computational complexity of the DNNs in use. Hence, it is becoming increasingly important to scale these DNNs so that they can fit on resource-constrained hardware and edge devices. The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms. To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and finally elaborate on the current hardware approaches towards efficient deployment of the micro-AI models on hardware. The main takeaways for a reader from this article will be understanding of different search spaces to pinpoint the best micro-AI model configuration, ability to interpret different quantization and sparsification techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.

AB - Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides with the fact that they can provide state-of-the-art inference accuracy for these applications. However, this advantage comes from the high computational complexity of the DNNs in use. Hence, it is becoming increasingly important to scale these DNNs so that they can fit on resource-constrained hardware and edge devices. The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms. To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and finally elaborate on the current hardware approaches towards efficient deployment of the micro-AI models on hardware. The main takeaways for a reader from this article will be understanding of different search spaces to pinpoint the best micro-AI model configuration, ability to interpret different quantization and sparsification techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.

KW - Deep neural networks

KW - hardware accelerators

KW - inference engines

KW - model compression

KW - neural architecture search

KW - quantization

UR - http://www.scopus.com/inward/record.url?scp=85120560885&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85120560885&partnerID=8YFLogxK

U2 - 10.1109/JETCAS.2021.3129415

DO - 10.1109/JETCAS.2021.3129415

M3 - Article

AN - SCOPUS:85120560885

SN - 2156-3357

VL - 11

SP - 532

EP - 547

JO - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

JF - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

IS - 4

ER -

A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this