From videos to verbs: Mining videos for activities using a cascade of dynamical systems

Pavan K. Turaga; Ashok Veeraraghavan; Rama Chellappa

doi:10.1109/CVPR.2007.383170

From videos to verbs: Mining videos for activities using a cascade of dynamical systems

Pavan K. Turaga, Ashok Veeraraghavan, Rama Chellappa

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

35 Scopus citations

Abstract

Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity subsequences. In order to do this, we build a generative model for activities (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of activities. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an activity. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments show that the clusters found by the algorithm correspond to semantically meaningful activities.

Original language	English (US)
Title of host publication	2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07
DOIs	https://doi.org/10.1109/CVPR.2007.383170
State	Published - 2007
Externally published	Yes
Event	2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07 - Minneapolis, MN, United States Duration: Jun 17 2007 → Jun 22 2007

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)	1063-6919

Other

Other	2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07
Country/Territory	United States
City	Minneapolis, MN
Period	6/17/07 → 6/22/07

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/CVPR.2007.383170

Cite this

Turaga, P. K., Veeraraghavan, A., & Chellappa, R. (2007). From videos to verbs: Mining videos for activities using a cascade of dynamical systems. In 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07 Article 4270195 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). https://doi.org/10.1109/CVPR.2007.383170

From videos to verbs: Mining videos for activities using a cascade of dynamical systems. / Turaga, Pavan K.; Veeraraghavan, Ashok; Chellappa, Rama.
2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07. 2007. 4270195 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Turaga, PK, Veeraraghavan, A & Chellappa, R 2007, From videos to verbs: Mining videos for activities using a cascade of dynamical systems. in 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07., 4270195, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07, Minneapolis, MN, United States, 6/17/07. https://doi.org/10.1109/CVPR.2007.383170

@inproceedings{34ad91356a98430a8c5a01cb0601d62b,

title = "From videos to verbs: Mining videos for activities using a cascade of dynamical systems",

abstract = "Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity subsequences. In order to do this, we build a generative model for activities (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of activities. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an activity. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments show that the clusters found by the algorithm correspond to semantically meaningful activities.",

author = "Turaga, {Pavan K.} and Ashok Veeraraghavan and Rama Chellappa",

year = "2007",

doi = "10.1109/CVPR.2007.383170",

language = "English (US)",

isbn = "1424411807",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

booktitle = "2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07",

note = "2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07 ; Conference date: 17-06-2007 Through 22-06-2007",

}

TY - GEN

T1 - From videos to verbs

T2 - 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07

AU - Turaga, Pavan K.

AU - Veeraraghavan, Ashok

AU - Chellappa, Rama

PY - 2007

Y1 - 2007

N2 - Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity subsequences. In order to do this, we build a generative model for activities (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of activities. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an activity. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments show that the clusters found by the algorithm correspond to semantically meaningful activities.

AB - Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity subsequences. In order to do this, we build a generative model for activities (in video) using a cascade of dynamical systems and show that this model is able to capture and represent a diverse class of activities. We then derive algorithms to learn the model parameters from a video stream and also show how a single video sequence may be clustered into different clusters where each cluster represents an activity. We also propose a novel technique to build affine, view, rate invariance of the activity into the distance metric for clustering. Experiments show that the clusters found by the algorithm correspond to semantically meaningful activities.

UR - http://www.scopus.com/inward/record.url?scp=34948834193&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34948834193&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2007.383170

DO - 10.1109/CVPR.2007.383170

M3 - Conference contribution

AN - SCOPUS:34948834193

SN - 1424411807

SN - 9781424411801

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

BT - 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR'07

Y2 - 17 June 2007 through 22 June 2007

ER -

From videos to verbs: Mining videos for activities using a cascade of dynamical systems

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this