MSR-CNN: Applying motion salient region based descriptors for action recognition

Zhigang Tu; Jun Cao; Yikang Li; Baoxin Li

doi:10.1109/ICPR.2016.7900180

MSR-CNN: Applying motion salient region based descriptors for action recognition

Zhigang Tu, Jun Cao, Yikang Li, Baoxin Li

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

21 Scopus citations

Abstract

In recent years the most popular video-based human action recognition methods rely on extracting feature representations using Convolutional Neural Networks (CNN) and then using these representations to classify actions. In this work, we propose a fast and accurate video representation that is derived from the motion-salient region (MSR), which represents features most useful for action labeling. By improving a well-performed foreground detection technique, the region of interest (ROI) corresponding to actors in the foreground in both the appearance and the motion field can be detected under various realistic challenges. Furthermore, we propose a complementary motion salient measure to select a secondary ROI - the major moving part of the human. Accordingly, a MSR-based CNN descriptor (MSR-CNN) is formulated to recognize human action, where the descriptor incorporates appearance and motion features along with tracks of MSR. The computation can be efficiently implemented due to two characteristics: 1) only part of the RGB image and the motion field need to be processed; 2) less data is used as input for the CNN feature extraction. Comparative evaluation on JHMDB and UCF Sports datasets shows that our method outperforms the state-of-the-art in both efficiency and accuracy.

Original language	English (US)
Title of host publication	2016 23rd International Conference on Pattern Recognition, ICPR 2016
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	3524-3529
Number of pages	6
ISBN (Electronic)	9781509048472
DOIs	https://doi.org/10.1109/ICPR.2016.7900180
State	Published - Jan 1 2016
Event	23rd International Conference on Pattern Recognition, ICPR 2016 - Cancun, Mexico Duration: Dec 4 2016 → Dec 8 2016

Publication series

Name	Proceedings - International Conference on Pattern Recognition
Volume	0
ISSN (Print)	1051-4651

Other

Other	23rd International Conference on Pattern Recognition, ICPR 2016
Country/Territory	Mexico
City	Cancun
Period	12/4/16 → 12/8/16

Keywords

Action recognition
Convolutional Neural Networks
Motion salient regions

ASJC Scopus subject areas

Computer Vision and Pattern Recognition

Access to Document

10.1109/ICPR.2016.7900180

Cite this

Tu, Z., Cao, J., Li, Y., & Li, B. (2016). MSR-CNN: Applying motion salient region based descriptors for action recognition. In 2016 23rd International Conference on Pattern Recognition, ICPR 2016 (pp. 3524-3529). Article 7900180 (Proceedings - International Conference on Pattern Recognition; Vol. 0). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPR.2016.7900180

MSR-CNN: Applying motion salient region based descriptors for action recognition. / Tu, Zhigang; Cao, Jun; Li, Yikang et al.
2016 23rd International Conference on Pattern Recognition, ICPR 2016. Institute of Electrical and Electronics Engineers Inc., 2016. p. 3524-3529 7900180 (Proceedings - International Conference on Pattern Recognition; Vol. 0).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Tu, Z, Cao, J, Li, Y & Li, B 2016, MSR-CNN: Applying motion salient region based descriptors for action recognition. in 2016 23rd International Conference on Pattern Recognition, ICPR 2016., 7900180, Proceedings - International Conference on Pattern Recognition, vol. 0, Institute of Electrical and Electronics Engineers Inc., pp. 3524-3529, 23rd International Conference on Pattern Recognition, ICPR 2016, Cancun, Mexico, 12/4/16. https://doi.org/10.1109/ICPR.2016.7900180

@inproceedings{b2d590e2444e4c8db8b12bd63992d4f0,

title = "MSR-CNN: Applying motion salient region based descriptors for action recognition",

abstract = "In recent years the most popular video-based human action recognition methods rely on extracting feature representations using Convolutional Neural Networks (CNN) and then using these representations to classify actions. In this work, we propose a fast and accurate video representation that is derived from the motion-salient region (MSR), which represents features most useful for action labeling. By improving a well-performed foreground detection technique, the region of interest (ROI) corresponding to actors in the foreground in both the appearance and the motion field can be detected under various realistic challenges. Furthermore, we propose a complementary motion salient measure to select a secondary ROI - the major moving part of the human. Accordingly, a MSR-based CNN descriptor (MSR-CNN) is formulated to recognize human action, where the descriptor incorporates appearance and motion features along with tracks of MSR. The computation can be efficiently implemented due to two characteristics: 1) only part of the RGB image and the motion field need to be processed; 2) less data is used as input for the CNN feature extraction. Comparative evaluation on JHMDB and UCF Sports datasets shows that our method outperforms the state-of-the-art in both efficiency and accuracy.",

keywords = "Action recognition, Convolutional Neural Networks, Motion salient regions",

author = "Zhigang Tu and Jun Cao and Yikang Li and Baoxin Li",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.; 23rd International Conference on Pattern Recognition, ICPR 2016 ; Conference date: 04-12-2016 Through 08-12-2016",

year = "2016",

month = jan,

day = "1",

doi = "10.1109/ICPR.2016.7900180",

language = "English (US)",

series = "Proceedings - International Conference on Pattern Recognition",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "3524--3529",

booktitle = "2016 23rd International Conference on Pattern Recognition, ICPR 2016",

}

TY - GEN

T1 - MSR-CNN

T2 - 23rd International Conference on Pattern Recognition, ICPR 2016

AU - Tu, Zhigang

AU - Cao, Jun

AU - Li, Yikang

AU - Li, Baoxin

PY - 2016/1/1

Y1 - 2016/1/1

N2 - In recent years the most popular video-based human action recognition methods rely on extracting feature representations using Convolutional Neural Networks (CNN) and then using these representations to classify actions. In this work, we propose a fast and accurate video representation that is derived from the motion-salient region (MSR), which represents features most useful for action labeling. By improving a well-performed foreground detection technique, the region of interest (ROI) corresponding to actors in the foreground in both the appearance and the motion field can be detected under various realistic challenges. Furthermore, we propose a complementary motion salient measure to select a secondary ROI - the major moving part of the human. Accordingly, a MSR-based CNN descriptor (MSR-CNN) is formulated to recognize human action, where the descriptor incorporates appearance and motion features along with tracks of MSR. The computation can be efficiently implemented due to two characteristics: 1) only part of the RGB image and the motion field need to be processed; 2) less data is used as input for the CNN feature extraction. Comparative evaluation on JHMDB and UCF Sports datasets shows that our method outperforms the state-of-the-art in both efficiency and accuracy.

AB - In recent years the most popular video-based human action recognition methods rely on extracting feature representations using Convolutional Neural Networks (CNN) and then using these representations to classify actions. In this work, we propose a fast and accurate video representation that is derived from the motion-salient region (MSR), which represents features most useful for action labeling. By improving a well-performed foreground detection technique, the region of interest (ROI) corresponding to actors in the foreground in both the appearance and the motion field can be detected under various realistic challenges. Furthermore, we propose a complementary motion salient measure to select a secondary ROI - the major moving part of the human. Accordingly, a MSR-based CNN descriptor (MSR-CNN) is formulated to recognize human action, where the descriptor incorporates appearance and motion features along with tracks of MSR. The computation can be efficiently implemented due to two characteristics: 1) only part of the RGB image and the motion field need to be processed; 2) less data is used as input for the CNN feature extraction. Comparative evaluation on JHMDB and UCF Sports datasets shows that our method outperforms the state-of-the-art in both efficiency and accuracy.

KW - Action recognition

KW - Convolutional Neural Networks

KW - Motion salient regions

UR - http://www.scopus.com/inward/record.url?scp=85019149204&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85019149204&partnerID=8YFLogxK

U2 - 10.1109/ICPR.2016.7900180

DO - 10.1109/ICPR.2016.7900180

M3 - Conference contribution

AN - SCOPUS:85019149204

T3 - Proceedings - International Conference on Pattern Recognition

SP - 3524

EP - 3529

BT - 2016 23rd International Conference on Pattern Recognition, ICPR 2016

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 4 December 2016 through 8 December 2016

ER -

MSR-CNN: Applying motion salient region based descriptors for action recognition

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this