Human activity encoding and recognition using low-level visual features

Zheshen Wang, Baoxin Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

Automatic recognition of human activities is among the key capabilities of many intelligent systems with vision/perception. Most existing approaches to this problem require sophisticated feature extraction before classification can be performed. This paper presents a novel approach for human action recognition using only simple low-level visual features: motion captured from direct frame differencing. A codebook of key poses is first created from the training data through unsupervised clustering. Videos of actions are then coded as sequences of super-frames, defined as the key poses augmented with discriminative attributes. A weighted-sequence distance is proposed for comparing two super-frame sequences, which is further wrapped as a kernel embedded in a SVM classifier for the final classification. Compared with conventional methods, our approach provides a flexible non-parametric sequential structure with a corresponding distance measure for human action representation and classification without requiring complex feature extraction. The effectiveness of our approach is demonstrated with the widely-used KTH human activity dataset, for which the proposed method outperforms the existing state-of-the-art.

Original languageEnglish (US)
Title of host publicationIJCAI International Joint Conference on Artificial Intelligence
Pages1876-1882
Number of pages7
StatePublished - 2009
Event21st International Joint Conference on Artificial Intelligence, IJCAI-09 - Pasadena, CA, United States
Duration: Jul 11 2009Jul 17 2009

Other

Other21st International Joint Conference on Artificial Intelligence, IJCAI-09
CountryUnited States
CityPasadena, CA
Period7/11/097/17/09

Fingerprint

Feature extraction
Intelligent systems
Classifiers

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Wang, Z., & Li, B. (2009). Human activity encoding and recognition using low-level visual features. In IJCAI International Joint Conference on Artificial Intelligence (pp. 1876-1882)

Human activity encoding and recognition using low-level visual features. / Wang, Zheshen; Li, Baoxin.

IJCAI International Joint Conference on Artificial Intelligence. 2009. p. 1876-1882.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, Z & Li, B 2009, Human activity encoding and recognition using low-level visual features. in IJCAI International Joint Conference on Artificial Intelligence. pp. 1876-1882, 21st International Joint Conference on Artificial Intelligence, IJCAI-09, Pasadena, CA, United States, 7/11/09.
Wang Z, Li B. Human activity encoding and recognition using low-level visual features. In IJCAI International Joint Conference on Artificial Intelligence. 2009. p. 1876-1882
Wang, Zheshen ; Li, Baoxin. / Human activity encoding and recognition using low-level visual features. IJCAI International Joint Conference on Artificial Intelligence. 2009. pp. 1876-1882
@inproceedings{ce1f0280c503419ab69fc1e92e641ba0,
title = "Human activity encoding and recognition using low-level visual features",
abstract = "Automatic recognition of human activities is among the key capabilities of many intelligent systems with vision/perception. Most existing approaches to this problem require sophisticated feature extraction before classification can be performed. This paper presents a novel approach for human action recognition using only simple low-level visual features: motion captured from direct frame differencing. A codebook of key poses is first created from the training data through unsupervised clustering. Videos of actions are then coded as sequences of super-frames, defined as the key poses augmented with discriminative attributes. A weighted-sequence distance is proposed for comparing two super-frame sequences, which is further wrapped as a kernel embedded in a SVM classifier for the final classification. Compared with conventional methods, our approach provides a flexible non-parametric sequential structure with a corresponding distance measure for human action representation and classification without requiring complex feature extraction. The effectiveness of our approach is demonstrated with the widely-used KTH human activity dataset, for which the proposed method outperforms the existing state-of-the-art.",
author = "Zheshen Wang and Baoxin Li",
year = "2009",
language = "English (US)",
isbn = "9781577354260",
pages = "1876--1882",
booktitle = "IJCAI International Joint Conference on Artificial Intelligence",

}

TY - GEN

T1 - Human activity encoding and recognition using low-level visual features

AU - Wang, Zheshen

AU - Li, Baoxin

PY - 2009

Y1 - 2009

N2 - Automatic recognition of human activities is among the key capabilities of many intelligent systems with vision/perception. Most existing approaches to this problem require sophisticated feature extraction before classification can be performed. This paper presents a novel approach for human action recognition using only simple low-level visual features: motion captured from direct frame differencing. A codebook of key poses is first created from the training data through unsupervised clustering. Videos of actions are then coded as sequences of super-frames, defined as the key poses augmented with discriminative attributes. A weighted-sequence distance is proposed for comparing two super-frame sequences, which is further wrapped as a kernel embedded in a SVM classifier for the final classification. Compared with conventional methods, our approach provides a flexible non-parametric sequential structure with a corresponding distance measure for human action representation and classification without requiring complex feature extraction. The effectiveness of our approach is demonstrated with the widely-used KTH human activity dataset, for which the proposed method outperforms the existing state-of-the-art.

AB - Automatic recognition of human activities is among the key capabilities of many intelligent systems with vision/perception. Most existing approaches to this problem require sophisticated feature extraction before classification can be performed. This paper presents a novel approach for human action recognition using only simple low-level visual features: motion captured from direct frame differencing. A codebook of key poses is first created from the training data through unsupervised clustering. Videos of actions are then coded as sequences of super-frames, defined as the key poses augmented with discriminative attributes. A weighted-sequence distance is proposed for comparing two super-frame sequences, which is further wrapped as a kernel embedded in a SVM classifier for the final classification. Compared with conventional methods, our approach provides a flexible non-parametric sequential structure with a corresponding distance measure for human action representation and classification without requiring complex feature extraction. The effectiveness of our approach is demonstrated with the widely-used KTH human activity dataset, for which the proposed method outperforms the existing state-of-the-art.

UR - http://www.scopus.com/inward/record.url?scp=78751692959&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78751692959&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:78751692959

SN - 9781577354260

SP - 1876

EP - 1882

BT - IJCAI International Joint Conference on Artificial Intelligence

ER -