A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS

Minkyu Kim; Abinash Mohanty; Deepak Kadetotad; Luning Wei; Xiaofei He; Yu Cao; Jae Sun Seo

doi:10.1109/TCSI.2019.2921714

A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS

Minkyu Kim, Abinash Mohanty, Deepak Kadetotad, Luning Wei, Xiaofei He, Yu Cao, Jae Sun Seo

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.

Original language	English (US)
Article number	8741167
Pages (from-to)	3843-3853
Number of pages	11
Journal	IEEE Transactions on Circuits and Systems I: Regular Papers
Volume	66
Issue number	10
DOIs	https://doi.org/10.1109/TCSI.2019.2921714
State	Published - Oct 2019

Keywords

Object detection
classification
low-power
machine learning
real-Time
special-purpose accelerator

ASJC Scopus subject areas

Hardware and Architecture
Electrical and Electronic Engineering

Access to Document

10.1109/TCSI.2019.2921714

Cite this

@article{863fce2af07e4c4cb2bf12d8134df883,

title = "A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS",

abstract = "Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.",

keywords = "Object detection, classification, low-power, machine learning, real-Time, special-purpose accelerator",

author = "Minkyu Kim and Abinash Mohanty and Deepak Kadetotad and Luning Wei and Xiaofei He and Yu Cao and Seo, {Jae Sun}",

note = "Funding Information: Manuscript received January 6, 2019; revised April 24, 2019; accepted May 30, 2019. Date of publication June 19, 2019; date of current version September 27, 2019. This work was supported in part by NSF under Grant NSF-CCF-1652866, and in part by the Center for Brain-Inspired Computing (C-BRIC), one of six centers in Joint University Microelectronics Program (JUMP), a Semiconductor Research Corporation (SRC) program sponsored by the Defense Advanced Research Projects Agency (DARPA). This paper was recommended by Associate Editor P. K. Meher. (Corresponding author: Minkyu Kim.) M. Kim, A. Mohanty, D. Kadetotad, Y. Cao, and J.-S. Seo are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail: mkim152@asu.edu). Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2019",

month = oct,

doi = "10.1109/TCSI.2019.2921714",

language = "English (US)",

volume = "66",

pages = "3843--3853",

journal = "IEEE Transactions on Circuits and Systems I: Regular Papers",

issn = "1549-8328",

number = "10",

}

TY - JOUR

T1 - A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS

AU - Kim, Minkyu

AU - Mohanty, Abinash

AU - Kadetotad, Deepak

AU - Wei, Luning

AU - He, Xiaofei

AU - Cao, Yu

AU - Seo, Jae Sun

N1 - Funding Information: Manuscript received January 6, 2019; revised April 24, 2019; accepted May 30, 2019. Date of publication June 19, 2019; date of current version September 27, 2019. This work was supported in part by NSF under Grant NSF-CCF-1652866, and in part by the Center for Brain-Inspired Computing (C-BRIC), one of six centers in Joint University Microelectronics Program (JUMP), a Semiconductor Research Corporation (SRC) program sponsored by the Defense Advanced Research Projects Agency (DARPA). This paper was recommended by Associate Editor P. K. Meher. (Corresponding author: Minkyu Kim.) M. Kim, A. Mohanty, D. Kadetotad, Y. Cao, and J.-S. Seo are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail: mkim152@asu.edu). Publisher Copyright: © 2004-2012 IEEE.

PY - 2019/10

Y1 - 2019/10

N2 - Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.

AB - Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.

KW - Object detection

KW - classification

KW - low-power

KW - machine learning

KW - real-Time

KW - special-purpose accelerator

UR - http://www.scopus.com/inward/record.url?scp=85072973325&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072973325&partnerID=8YFLogxK

U2 - 10.1109/TCSI.2019.2921714

DO - 10.1109/TCSI.2019.2921714

M3 - Article

AN - SCOPUS:85072973325

SN - 1549-8328

VL - 66

SP - 3843

EP - 3853

JO - IEEE Transactions on Circuits and Systems I: Regular Papers

JF - IEEE Transactions on Circuits and Systems I: Regular Papers

IS - 10

M1 - 8741167

ER -

A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this