TY - JOUR
T1 - A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference
AU - Mazumder, Arnab Neelim
AU - Meng, Jian
AU - Rashid, Hasib Al
AU - Kallakuri, Utteja
AU - Zhang, Xin
AU - Seo, Jae Sun
AU - Mohsenin, Tinoosh
N1 - Publisher Copyright:
© 2011 IEEE.
PY - 2021/12/1
Y1 - 2021/12/1
N2 - Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides with the fact that they can provide state-of-the-art inference accuracy for these applications. However, this advantage comes from the high computational complexity of the DNNs in use. Hence, it is becoming increasingly important to scale these DNNs so that they can fit on resource-constrained hardware and edge devices. The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms. To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and finally elaborate on the current hardware approaches towards efficient deployment of the micro-AI models on hardware. The main takeaways for a reader from this article will be understanding of different search spaces to pinpoint the best micro-AI model configuration, ability to interpret different quantization and sparsification techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.
AB - Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides with the fact that they can provide state-of-the-art inference accuracy for these applications. However, this advantage comes from the high computational complexity of the DNNs in use. Hence, it is becoming increasingly important to scale these DNNs so that they can fit on resource-constrained hardware and edge devices. The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms. To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and finally elaborate on the current hardware approaches towards efficient deployment of the micro-AI models on hardware. The main takeaways for a reader from this article will be understanding of different search spaces to pinpoint the best micro-AI model configuration, ability to interpret different quantization and sparsification techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.
KW - Deep neural networks
KW - hardware accelerators
KW - inference engines
KW - model compression
KW - neural architecture search
KW - quantization
UR - http://www.scopus.com/inward/record.url?scp=85120560885&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85120560885&partnerID=8YFLogxK
U2 - 10.1109/JETCAS.2021.3129415
DO - 10.1109/JETCAS.2021.3129415
M3 - Article
AN - SCOPUS:85120560885
SN - 2156-3357
VL - 11
SP - 532
EP - 547
JO - IEEE Journal on Emerging and Selected Topics in Circuits and Systems
JF - IEEE Journal on Emerging and Selected Topics in Circuits and Systems
IS - 4
ER -