Learning action dictionaries from video

Pavan Turaga; Rama Chellappa

doi:10.1109/ICIP.2008.4712102

Learning action dictionaries from video

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Summarizing the contents of a video containing human activities is an important problem in computer vision and has important applications in automated surveillance systems. Summarizing a video requires one to identify and learn a 'vocabulary' of action-phrases corresponding to specific events and actions occurring in the video. We propose a generative model for dynamic scenes containing human activities as a composition of independent action-phrases - each of which is derived from an underlying vocabulary. Given a long video sequence, we propose a completely unsupervised approach to learn the vocabulary. Once the vocabulary is learnt, a video segment can be decomposed into a collection of phrases for summarization. We then describe methods to learn the correlations between activities and sequentiality of events. We also propose a novel method for building invariances to spatial transforms in the summarization scheme.

Original language	English (US)
Title of host publication	2008 IEEE International Conference on Image Processing, ICIP 2008 Proceedings
Pages	1704-1707
Number of pages	4
DOIs	https://doi.org/10.1109/ICIP.2008.4712102
State	Published - 2008
Externally published	Yes
Event	2008 IEEE International Conference on Image Processing, ICIP 2008 - San Diego, CA, United States Duration: Oct 12 2008 → Oct 15 2008

Publication series

Name	Proceedings - International Conference on Image Processing, ICIP
ISSN (Print)	1522-4880

Other

Other	2008 IEEE International Conference on Image Processing, ICIP 2008
Country/Territory	United States
City	San Diego, CA
Period	10/12/08 → 10/15/08

Keywords

Activity analysis
Video summarization

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition
Signal Processing

Access to Document

10.1109/ICIP.2008.4712102

Cite this

Turaga, P & Chellappa, R 2008, Learning action dictionaries from video. in 2008 IEEE International Conference on Image Processing, ICIP 2008 Proceedings., 4712102, Proceedings - International Conference on Image Processing, ICIP, pp. 1704-1707, 2008 IEEE International Conference on Image Processing, ICIP 2008, San Diego, CA, United States, 10/12/08. https://doi.org/10.1109/ICIP.2008.4712102

@inproceedings{5cf8fabfdbb74415ab37597894a14bdc,

title = "Learning action dictionaries from video",

abstract = "Summarizing the contents of a video containing human activities is an important problem in computer vision and has important applications in automated surveillance systems. Summarizing a video requires one to identify and learn a 'vocabulary' of action-phrases corresponding to specific events and actions occurring in the video. We propose a generative model for dynamic scenes containing human activities as a composition of independent action-phrases - each of which is derived from an underlying vocabulary. Given a long video sequence, we propose a completely unsupervised approach to learn the vocabulary. Once the vocabulary is learnt, a video segment can be decomposed into a collection of phrases for summarization. We then describe methods to learn the correlations between activities and sequentiality of events. We also propose a novel method for building invariances to spatial transforms in the summarization scheme.",

keywords = "Activity analysis, Video summarization",

author = "Pavan Turaga and Rama Chellappa",

year = "2008",

doi = "10.1109/ICIP.2008.4712102",

language = "English (US)",

isbn = "1424417643",

series = "Proceedings - International Conference on Image Processing, ICIP",

pages = "1704--1707",

booktitle = "2008 IEEE International Conference on Image Processing, ICIP 2008 Proceedings",

note = "2008 IEEE International Conference on Image Processing, ICIP 2008 ; Conference date: 12-10-2008 Through 15-10-2008",

}

TY - GEN

T1 - Learning action dictionaries from video

AU - Turaga, Pavan

AU - Chellappa, Rama

PY - 2008

Y1 - 2008

N2 - Summarizing the contents of a video containing human activities is an important problem in computer vision and has important applications in automated surveillance systems. Summarizing a video requires one to identify and learn a 'vocabulary' of action-phrases corresponding to specific events and actions occurring in the video. We propose a generative model for dynamic scenes containing human activities as a composition of independent action-phrases - each of which is derived from an underlying vocabulary. Given a long video sequence, we propose a completely unsupervised approach to learn the vocabulary. Once the vocabulary is learnt, a video segment can be decomposed into a collection of phrases for summarization. We then describe methods to learn the correlations between activities and sequentiality of events. We also propose a novel method for building invariances to spatial transforms in the summarization scheme.

AB - Summarizing the contents of a video containing human activities is an important problem in computer vision and has important applications in automated surveillance systems. Summarizing a video requires one to identify and learn a 'vocabulary' of action-phrases corresponding to specific events and actions occurring in the video. We propose a generative model for dynamic scenes containing human activities as a composition of independent action-phrases - each of which is derived from an underlying vocabulary. Given a long video sequence, we propose a completely unsupervised approach to learn the vocabulary. Once the vocabulary is learnt, a video segment can be decomposed into a collection of phrases for summarization. We then describe methods to learn the correlations between activities and sequentiality of events. We also propose a novel method for building invariances to spatial transforms in the summarization scheme.

KW - Activity analysis

KW - Video summarization

UR - http://www.scopus.com/inward/record.url?scp=69949123416&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69949123416&partnerID=8YFLogxK

U2 - 10.1109/ICIP.2008.4712102

DO - 10.1109/ICIP.2008.4712102

M3 - Conference contribution

AN - SCOPUS:69949123416

SN - 1424417643

SN - 9781424417643

T3 - Proceedings - International Conference on Image Processing, ICIP

SP - 1704

EP - 1707

BT - 2008 IEEE International Conference on Image Processing, ICIP 2008 Proceedings

T2 - 2008 IEEE International Conference on Image Processing, ICIP 2008

Y2 - 12 October 2008 through 15 October 2008

ER -

Learning action dictionaries from video

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this