MoNet3D: Towards accurate monocular 3d object localization in real time

Xichuan Zhou, Yicong Peng, Chunqiao Long, Fengbo Ren, Cong Shi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Monocular multi-object detection and localization in 3D space has been proven to be a challenging task. The MoNet3D algorithm is a novel and effective framework that can predict the 3D position of each object in a monocular image and draw a 3D bounding box for each object. The MoNet3D method incorporates prior knowledge of the spatial geometric correlation of neighbouring objects into the deep neural network training process to improve the accuracy of 3D object localization. Experiments on the KITTI dataset show that the accuracy for predicting the depth and horizontal coordinates of objects in 3D space can reach 96.25% and 94.74%, respectively. Moreover, the method can realize the real-Time image processing at 27.85 FPS, showing promising potential for embedded advanced drivingassistance system applications. Our code is publicly available at https://github.com/CQUlearningsystemgroup/YicongPeng.

Original languageEnglish (US)
Title of host publication37th International Conference on Machine Learning, ICML 2020
EditorsHal Daume, Aarti Singh
PublisherInternational Machine Learning Society (IMLS)
Pages11440-11449
Number of pages10
ISBN (Electronic)9781713821120
StatePublished - 2020
Externally publishedYes
Event37th International Conference on Machine Learning, ICML 2020 - Virtual, Online
Duration: Jul 13 2020Jul 18 2020

Publication series

Name37th International Conference on Machine Learning, ICML 2020
VolumePartF168147-15

Conference

Conference37th International Conference on Machine Learning, ICML 2020
CityVirtual, Online
Period7/13/207/18/20

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint

Dive into the research topics of 'MoNet3D: Towards accurate monocular 3d object localization in real time'. Together they form a unique fingerprint.

Cite this