TY - GEN
T1 - MSR-CNN
T2 - 23rd International Conference on Pattern Recognition, ICPR 2016
AU - Tu, Zhigang
AU - Cao, Jun
AU - Li, Yikang
AU - Li, Baoxin
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/1/1
Y1 - 2016/1/1
N2 - In recent years the most popular video-based human action recognition methods rely on extracting feature representations using Convolutional Neural Networks (CNN) and then using these representations to classify actions. In this work, we propose a fast and accurate video representation that is derived from the motion-salient region (MSR), which represents features most useful for action labeling. By improving a well-performed foreground detection technique, the region of interest (ROI) corresponding to actors in the foreground in both the appearance and the motion field can be detected under various realistic challenges. Furthermore, we propose a complementary motion salient measure to select a secondary ROI - the major moving part of the human. Accordingly, a MSR-based CNN descriptor (MSR-CNN) is formulated to recognize human action, where the descriptor incorporates appearance and motion features along with tracks of MSR. The computation can be efficiently implemented due to two characteristics: 1) only part of the RGB image and the motion field need to be processed; 2) less data is used as input for the CNN feature extraction. Comparative evaluation on JHMDB and UCF Sports datasets shows that our method outperforms the state-of-the-art in both efficiency and accuracy.
AB - In recent years the most popular video-based human action recognition methods rely on extracting feature representations using Convolutional Neural Networks (CNN) and then using these representations to classify actions. In this work, we propose a fast and accurate video representation that is derived from the motion-salient region (MSR), which represents features most useful for action labeling. By improving a well-performed foreground detection technique, the region of interest (ROI) corresponding to actors in the foreground in both the appearance and the motion field can be detected under various realistic challenges. Furthermore, we propose a complementary motion salient measure to select a secondary ROI - the major moving part of the human. Accordingly, a MSR-based CNN descriptor (MSR-CNN) is formulated to recognize human action, where the descriptor incorporates appearance and motion features along with tracks of MSR. The computation can be efficiently implemented due to two characteristics: 1) only part of the RGB image and the motion field need to be processed; 2) less data is used as input for the CNN feature extraction. Comparative evaluation on JHMDB and UCF Sports datasets shows that our method outperforms the state-of-the-art in both efficiency and accuracy.
KW - Action recognition
KW - Convolutional Neural Networks
KW - Motion salient regions
UR - http://www.scopus.com/inward/record.url?scp=85019149204&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019149204&partnerID=8YFLogxK
U2 - 10.1109/ICPR.2016.7900180
DO - 10.1109/ICPR.2016.7900180
M3 - Conference contribution
AN - SCOPUS:85019149204
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3524
EP - 3529
BT - 2016 23rd International Conference on Pattern Recognition, ICPR 2016
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 4 December 2016 through 8 December 2016
ER -