Machine recognition of human activities: A survey

Pavan Turaga, Rama Chellappa, V. S. Subrahmanian, Octavian Udrea

Research output: Contribution to journalArticle

960 Citations (Scopus)

Abstract

The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing - robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing - make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) "actions" and 2) "activities." "Actions" are characterized by simple motion patterns typically executed by a single human. "Activities" are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.

Original languageEnglish (US)
Article number4633644
Pages (from-to)1473-1488
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume18
Issue number11
DOIs
StatePublished - Nov 2008
Externally publishedYes

Fingerprint

Processing
Video cameras
Explosions
Semantics

Keywords

  • Human activity analysis
  • Image sequence analysis
  • Machine vision
  • Surveillance

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Media Technology

Cite this

Machine recognition of human activities : A survey. / Turaga, Pavan; Chellappa, Rama; Subrahmanian, V. S.; Udrea, Octavian.

In: IEEE Transactions on Circuits and Systems for Video Technology, Vol. 18, No. 11, 4633644, 11.2008, p. 1473-1488.

Research output: Contribution to journalArticle

Turaga, Pavan ; Chellappa, Rama ; Subrahmanian, V. S. ; Udrea, Octavian. / Machine recognition of human activities : A survey. In: IEEE Transactions on Circuits and Systems for Video Technology. 2008 ; Vol. 18, No. 11. pp. 1473-1488.
@article{9f85cdd962ba44d4adcefbcae70f5ec8,
title = "Machine recognition of human activities: A survey",
abstract = "The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing - robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing - make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) {"}actions{"} and 2) {"}activities.{"} {"}Actions{"} are characterized by simple motion patterns typically executed by a single human. {"}Activities{"} are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.",
keywords = "Human activity analysis, Image sequence analysis, Machine vision, Surveillance",
author = "Pavan Turaga and Rama Chellappa and Subrahmanian, {V. S.} and Octavian Udrea",
year = "2008",
month = "11",
doi = "10.1109/TCSVT.2008.2005594",
language = "English (US)",
volume = "18",
pages = "1473--1488",
journal = "IEEE Transactions on Circuits and Systems for Video Technology",
issn = "1051-8215",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "11",

}

TY - JOUR

T1 - Machine recognition of human activities

T2 - A survey

AU - Turaga, Pavan

AU - Chellappa, Rama

AU - Subrahmanian, V. S.

AU - Udrea, Octavian

PY - 2008/11

Y1 - 2008/11

N2 - The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing - robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing - make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) "actions" and 2) "activities." "Actions" are characterized by simple motion patterns typically executed by a single human. "Activities" are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.

AB - The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing - robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing - make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) "actions" and 2) "activities." "Actions" are characterized by simple motion patterns typically executed by a single human. "Activities" are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities.

KW - Human activity analysis

KW - Image sequence analysis

KW - Machine vision

KW - Surveillance

UR - http://www.scopus.com/inward/record.url?scp=55149089260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=55149089260&partnerID=8YFLogxK

U2 - 10.1109/TCSVT.2008.2005594

DO - 10.1109/TCSVT.2008.2005594

M3 - Article

AN - SCOPUS:55149089260

VL - 18

SP - 1473

EP - 1488

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

SN - 1051-8215

IS - 11

M1 - 4633644

ER -