End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression

Anupreetham Anupreetham, Mohamed Ibrahim, Mathew Hall, Andrew Boutros, Ajay Kuzhively, Abinash Mohanty, Eriko Nurvitadhi, Vaughn Betz, Yu Cao, Jae Sun Seo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Object detection is an important computer vision task, with many applications in autonomous driving, smart surveillance, robotics, and other domains. Single-shot detectors (SSD) coupled with a convolutional neural network (CNN) for feature extraction can efficiently detect, classify and localize various objects in an input image with very high accuracy. In such systems, the convolution layers extract features and predict the bounding box locations for the detected objects as well as their confidence scores. Then, a non-maximum suppression (NMS) algorithm eliminates partially overlapping boxes and selects the bounding box with the highest score per class. However, these two components are strictly sequential; a conventional NMS algorithm needs to wait for all box predictions to be produced before processing them. This prohibits any overlap between the execution of the convolutional layers and NMS, resulting in significant latency overhead and throughput degradation. In this paper, we present a novel NMS algorithm that alleviates this bottleneck and enables a fully-pipelined hardware implementation. We also implement an end-to-end system for low-latency SSDMobileNet-V1 object detection, which combines a state-of-the-art deeply-pipelined CNN accelerator with a custom hardware implementation of our novel NMS algorithm. As a result of our new algorithm, the NMS module adds a minimal latency overhead of only 0.13μs to the SSD-MobileNet-V1 convolution layers. Our end-to-end object detection system implemented on an Intel Stratix 10 FPGA runs at a maximum operating frequency of 350 MHz, with a throughput of 609 frames-per-second and an end-to-end batch-1 latency of 2.4 ms. Our system achieves 1.5× higher throughput and 4.4× lower latency compared to the current state-of-the-art SSD-based object detection systems on FPGAs.

Original languageEnglish (US)
Title of host publicationProceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages76-82
Number of pages7
ISBN (Electronic)9781665437592
DOIs
StatePublished - 2021
Event31st International Conference on Field-Programmable Logic and Applications, FPL 2021 - Virtual, Dresden, Germany
Duration: Aug 30 2021Sep 3 2021

Publication series

NameProceedings - 2021 31st International Conference on Field-Programmable Logic and Applications, FPL 2021

Conference

Conference31st International Conference on Field-Programmable Logic and Applications, FPL 2021
Country/TerritoryGermany
CityVirtual, Dresden
Period8/30/219/3/21

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Hardware and Architecture
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'End-to-End FPGA-based Object Detection Using Pipelined CNN and Non-Maximum Suppression'. Together they form a unique fingerprint.

Cite this