A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Arnab Neelim Mazumder, Jian Meng, Hasib Al Rashid, Utteja Kallakuri, Xin Zhang, Jae Sun Seo, Tinoosh Mohsenin

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Deep neural networks (DNNs) are being prototyped for a variety of artificial intelligence (AI) tasks including computer vision, data analytics, robotics, etc. The efficacy of DNNs coincides with the fact that they can provide state-of-the-art inference accuracy for these applications. However, this advantage comes from the high computational complexity of the DNNs in use. Hence, it is becoming increasingly important to scale these DNNs so that they can fit on resource-constrained hardware and edge devices. The main goal is to allow efficient processing of the DNNs on low-power micro-AI platforms without compromising hardware resources and accuracy. In this work, we aim to provide a comprehensive survey about the recent developments in the domain of energy-efficient deployment of DNNs on micro-AI platforms. To this extent, we look at different neural architecture search strategies as part of micro-AI model design, provide extensive details about model compression and quantization strategies in practice, and finally elaborate on the current hardware approaches towards efficient deployment of the micro-AI models on hardware. The main takeaways for a reader from this article will be understanding of different search spaces to pinpoint the best micro-AI model configuration, ability to interpret different quantization and sparsification techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.

Original languageEnglish (US)
Pages (from-to)532-547
Number of pages16
JournalIEEE Journal on Emerging and Selected Topics in Circuits and Systems
Volume11
Issue number4
DOIs
StatePublished - Dec 1 2021
Externally publishedYes

Keywords

  • Deep neural networks
  • hardware accelerators
  • inference engines
  • model compression
  • neural architecture search
  • quantization

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference'. Together they form a unique fingerprint.

Cite this