TY - GEN
T1 - Human activity encoding and recognition using low-level visual features
AU - Wang, Zheshen
AU - Li, Baoxin
PY - 2009/1/1
Y1 - 2009/1/1
N2 - Automatic recognition of human activities is among the key capabilities of many intelligent systems with vision/perception. Most existing approaches to this problem require sophisticated feature extraction before classification can be performed. This paper presents a novel approach for human action recognition using only simple low-level visual features: motion captured from direct frame differencing. A codebook of key poses is first created from the training data through unsupervised clustering. Videos of actions are then coded as sequences of super-frames, defined as the key poses augmented with discriminative attributes. A weighted-sequence distance is proposed for comparing two super-frame sequences, which is further wrapped as a kernel embedded in a SVM classifier for the final classification. Compared with conventional methods, our approach provides a flexible non-parametric sequential structure with a corresponding distance measure for human action representation and classification without requiring complex feature extraction. The effectiveness of our approach is demonstrated with the widely-used KTH human activity dataset, for which the proposed method outperforms the existing state-of-the-art.
AB - Automatic recognition of human activities is among the key capabilities of many intelligent systems with vision/perception. Most existing approaches to this problem require sophisticated feature extraction before classification can be performed. This paper presents a novel approach for human action recognition using only simple low-level visual features: motion captured from direct frame differencing. A codebook of key poses is first created from the training data through unsupervised clustering. Videos of actions are then coded as sequences of super-frames, defined as the key poses augmented with discriminative attributes. A weighted-sequence distance is proposed for comparing two super-frame sequences, which is further wrapped as a kernel embedded in a SVM classifier for the final classification. Compared with conventional methods, our approach provides a flexible non-parametric sequential structure with a corresponding distance measure for human action representation and classification without requiring complex feature extraction. The effectiveness of our approach is demonstrated with the widely-used KTH human activity dataset, for which the proposed method outperforms the existing state-of-the-art.
UR - http://www.scopus.com/inward/record.url?scp=78751692959&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78751692959&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:78751692959
SN - 9781577354260
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 1876
EP - 1882
BT - IJCAI-09 - Proceedings of the 21st International Joint Conference on Artificial Intelligence
PB - International Joint Conferences on Artificial Intelligence
T2 - 21st International Joint Conference on Artificial Intelligence, IJCAI 2009
Y2 - 11 July 2009 through 16 July 2009
ER -