TY - JOUR
T1 - A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS
AU - Kim, Minkyu
AU - Mohanty, Abinash
AU - Kadetotad, Deepak
AU - Wei, Luning
AU - He, Xiaofei
AU - Cao, Yu
AU - Seo, Jae Sun
N1 - Funding Information:
Manuscript received January 6, 2019; revised April 24, 2019; accepted May 30, 2019. Date of publication June 19, 2019; date of current version September 27, 2019. This work was supported in part by NSF under Grant NSF-CCF-1652866, and in part by the Center for Brain-Inspired Computing (C-BRIC), one of six centers in Joint University Microelectronics Program (JUMP), a Semiconductor Research Corporation (SRC) program sponsored by the Defense Advanced Research Projects Agency (DARPA). This paper was recommended by Associate Editor P. K. Meher. (Corresponding author: Minkyu Kim.) M. Kim, A. Mohanty, D. Kadetotad, Y. Cao, and J.-S. Seo are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail: mkim152@asu.edu).
Publisher Copyright:
© 2004-2012 IEEE.
PY - 2019/10
Y1 - 2019/10
N2 - Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.
AB - Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.
KW - Object detection
KW - classification
KW - low-power
KW - machine learning
KW - real-Time
KW - special-purpose accelerator
UR - http://www.scopus.com/inward/record.url?scp=85072973325&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072973325&partnerID=8YFLogxK
U2 - 10.1109/TCSI.2019.2921714
DO - 10.1109/TCSI.2019.2921714
M3 - Article
AN - SCOPUS:85072973325
SN - 1549-8328
VL - 66
SP - 3843
EP - 3853
JO - IEEE Transactions on Circuits and Systems I: Regular Papers
JF - IEEE Transactions on Circuits and Systems I: Regular Papers
IS - 10
M1 - 8741167
ER -