Abstract

The ubiquity of video-based surveillance demands automated approaches to analysis of ever-increasing video footages. Action/Event localization and recognition are two critical capabilities in surveillance video analysis, which have been largely addressed separately in the literature. In this paper, we propose an approach to simultaneously localize and recognize visual events from raw surveillance videos, employing an end-to-end learning strategy. Our approach formulates the task as weakly-supervised sequential semantic segmentation, in which we utilize a specific convolutional RNN to capture not only the appearance and the motion information but also their temporal evolution patterns. We tested our approach on the VIRAT 2.0 dataset. The experimental results, in comparison with relevant existing state-of-the-art, suggest that the proposed approach is promising in delivering a practical solution.

Original languageEnglish (US)
Title of host publicationProceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781538692943
DOIs
StatePublished - Jul 2 2018
Event15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2018 - Auckland, New Zealand
Duration: Nov 27 2018Nov 30 2018

Publication series

NameProceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance

Conference

Conference15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, AVSS 2018
Country/TerritoryNew Zealand
CityAuckland
Period11/27/1811/30/18

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology

Fingerprint

Dive into the research topics of 'Simultaneous Event Localization and Recognition in Surveillance Video'. Together they form a unique fingerprint.

Cite this