A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS

Minkyu Kim, Abinash Mohanty, Deepak Kadetotad, Luning Wei, Xiaofei He, Yu Cao, Jae Sun Seo

Research output: Contribution to journalArticle

Abstract

Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.

Original languageEnglish (US)
Article number8741167
Pages (from-to)3843-3853
Number of pages11
JournalIEEE Transactions on Circuits and Systems I: Regular Papers
Volume66
Issue number10
DOIs
StatePublished - Oct 2019

Fingerprint

Particle accelerators
Application specific integrated circuits
Hardware
Learning systems
Traffic signs
Learning algorithms
Support vector machines
Electric power utilization
Classifiers
Railroad cars
Pixels
Detectors
Processing
Object detection
Drones
Internet of things

Keywords

  • classification
  • low-power
  • machine learning
  • Object detection
  • real-Time
  • special-purpose accelerator

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS. / Kim, Minkyu; Mohanty, Abinash; Kadetotad, Deepak; Wei, Luning; He, Xiaofei; Cao, Yu; Seo, Jae Sun.

In: IEEE Transactions on Circuits and Systems I: Regular Papers, Vol. 66, No. 10, 8741167, 10.2019, p. 3843-3853.

Research output: Contribution to journalArticle

Kim, Minkyu ; Mohanty, Abinash ; Kadetotad, Deepak ; Wei, Luning ; He, Xiaofei ; Cao, Yu ; Seo, Jae Sun. / A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS. In: IEEE Transactions on Circuits and Systems I: Regular Papers. 2019 ; Vol. 66, No. 10. pp. 3843-3853.
@article{863fce2af07e4c4cb2bf12d8134df883,
title = "A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS",
abstract = "Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.",
keywords = "classification, low-power, machine learning, Object detection, real-Time, special-purpose accelerator",
author = "Minkyu Kim and Abinash Mohanty and Deepak Kadetotad and Luning Wei and Xiaofei He and Yu Cao and Seo, {Jae Sun}",
year = "2019",
month = "10",
doi = "10.1109/TCSI.2019.2921714",
language = "English (US)",
volume = "66",
pages = "3843--3853",
journal = "IEEE Transactions on Circuits and Systems I: Regular Papers",
issn = "1549-8328",
number = "10",

}

TY - JOUR

T1 - A Real-Time 17-Scale Object Detection Accelerator with Adaptive 2000-Stage Classification in 65 nm CMOS

AU - Kim, Minkyu

AU - Mohanty, Abinash

AU - Kadetotad, Deepak

AU - Wei, Luning

AU - He, Xiaofei

AU - Cao, Yu

AU - Seo, Jae Sun

PY - 2019/10

Y1 - 2019/10

N2 - Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.

AB - Machine learning has become ubiquitous in applications including object detection, image/video classification, and natural language processing. While machine learning algorithms have been successfully used in many practical applications, accurate, fast, and low-power hardware implementations of such algorithms is still a challenging task, especially for mobile systems such as Internet of Things (IoT), autonomous vehicles, and smart drones. This paper presents an energy-efficient programmable ASIC accelerator for object detection. Our ASIC accelerator supports multi-class (e.g., face, traffic sign, car license plate, and pedestrian) that are programmable, many-object (up to 50) in one image with different sizes (17-scale support with 6 down-/11 up-scaling), and high accuracy (AP of 0.87/0.81/0.72/0.76 for FDDB/AFW/BTSD/Caltech datasets). We designed an integral channel detector with 2,000 classifiers for rigid boosted templates, where the number of stages used for classification can be adaptively controlled depending on the content of the search window. This can be implemented with a more modular hardware, compared to support vector machine (SVM) and deformable parts model (DPM) designs. By jointly optimizing the algorithm and the efficient hardware architecture, the prototype chip implemented in 65nm CMOS demonstrates real-Time object detection of 20-50 frames/s with low power consumption of 22.5-181.7 mW (0.54-1.75 nJ/pixel) at 0.58-1.1 V supply.

KW - classification

KW - low-power

KW - machine learning

KW - Object detection

KW - real-Time

KW - special-purpose accelerator

UR - http://www.scopus.com/inward/record.url?scp=85072973325&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072973325&partnerID=8YFLogxK

U2 - 10.1109/TCSI.2019.2921714

DO - 10.1109/TCSI.2019.2921714

M3 - Article

AN - SCOPUS:85072973325

VL - 66

SP - 3843

EP - 3853

JO - IEEE Transactions on Circuits and Systems I: Regular Papers

JF - IEEE Transactions on Circuits and Systems I: Regular Papers

SN - 1549-8328

IS - 10

M1 - 8741167

ER -