Recognizing unseen actions in a domain-adapted embedding space

Yikang Li; Sheng Hung Hu; Baoxin Li

doi:10.1109/ICIP.2016.7533150

Recognizing unseen actions in a domain-adapted embedding space

Yikang Li, Sheng Hung Hu, Baoxin Li

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

17 Scopus citations

Abstract

With the sustaining bloom of multimedia data, Zero-shot Learning (ZSL) techniques have attracted much attention in recent years for its ability to train learning models that can handle 'unseen' categories. Existing ZSL algorithms mainly take advantages of attribute-based semantic space and only focus on static image data. Besides, most ZSL studies merely consider the semantic embedded labels and fail to address domain shift problem. In this paper, we purpose a deep two-output model for video ZSL and action recognition tasks by computing both spatial and temporal features from video contents through distinct Convolutional Neural Networks (CNNs) and training a Multi-layer Perceptron (MLP) upon extracted features to map videos to semantic embedding word vectors. Moreover, we introduce a domain adaptation strategy named 'ConSSEV' - by combining outputs from two distinct output layers of our MLP to improve the results of zero-shot learning. Our experiments on UCF101 dataset demonstrate the purposed model has more advantages associated with more complex video embedding schemes, and outperforms the state-of-the-art zero-shot learning techniques.

Original language	English (US)
Title of host publication	2016 IEEE International Conference on Image Processing, ICIP 2016 - Proceedings
Publisher	IEEE Computer Society
Pages	4195-4199
Number of pages	5
ISBN (Electronic)	9781467399616
DOIs	https://doi.org/10.1109/ICIP.2016.7533150
State	Published - Aug 3 2016
Event	23rd IEEE International Conference on Image Processing, ICIP 2016 - Phoenix, United States Duration: Sep 25 2016 → Sep 28 2016

Publication series

Name	Proceedings - International Conference on Image Processing, ICIP
Volume	2016-August
ISSN (Print)	1522-4880

Other

Other	23rd IEEE International Conference on Image Processing, ICIP 2016
Country/Territory	United States
City	Phoenix
Period	9/25/16 → 9/28/16

Keywords

Action recognition
Convolutional neural network
Multi-layer perceptron
Zero-shot learning

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition
Signal Processing

Access to Document

10.1109/ICIP.2016.7533150

Cite this

Recognizing unseen actions in a domain-adapted embedding space. / Li, Yikang; Hu, Sheng Hung; Li, Baoxin.
2016 IEEE International Conference on Image Processing, ICIP 2016 - Proceedings. IEEE Computer Society, 2016. p. 4195-4199 7533150 (Proceedings - International Conference on Image Processing, ICIP; Vol. 2016-August).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Li, Y, Hu, SH & Li, B 2016, Recognizing unseen actions in a domain-adapted embedding space. in 2016 IEEE International Conference on Image Processing, ICIP 2016 - Proceedings., 7533150, Proceedings - International Conference on Image Processing, ICIP, vol. 2016-August, IEEE Computer Society, pp. 4195-4199, 23rd IEEE International Conference on Image Processing, ICIP 2016, Phoenix, United States, 9/25/16. https://doi.org/10.1109/ICIP.2016.7533150

@inproceedings{7fd0dbc3e0144188a623e35c81ae875b,

title = "Recognizing unseen actions in a domain-adapted embedding space",

abstract = "With the sustaining bloom of multimedia data, Zero-shot Learning (ZSL) techniques have attracted much attention in recent years for its ability to train learning models that can handle 'unseen' categories. Existing ZSL algorithms mainly take advantages of attribute-based semantic space and only focus on static image data. Besides, most ZSL studies merely consider the semantic embedded labels and fail to address domain shift problem. In this paper, we purpose a deep two-output model for video ZSL and action recognition tasks by computing both spatial and temporal features from video contents through distinct Convolutional Neural Networks (CNNs) and training a Multi-layer Perceptron (MLP) upon extracted features to map videos to semantic embedding word vectors. Moreover, we introduce a domain adaptation strategy named 'ConSSEV' - by combining outputs from two distinct output layers of our MLP to improve the results of zero-shot learning. Our experiments on UCF101 dataset demonstrate the purposed model has more advantages associated with more complex video embedding schemes, and outperforms the state-of-the-art zero-shot learning techniques.",

keywords = "Action recognition, Convolutional neural network, Multi-layer perceptron, Zero-shot learning",

author = "Yikang Li and Hu, {Sheng Hung} and Baoxin Li",

note = "Publisher Copyright: {\textcopyright} 2016 IEEE.; 23rd IEEE International Conference on Image Processing, ICIP 2016 ; Conference date: 25-09-2016 Through 28-09-2016",

year = "2016",

month = aug,

day = "3",

doi = "10.1109/ICIP.2016.7533150",

language = "English (US)",

series = "Proceedings - International Conference on Image Processing, ICIP",

publisher = "IEEE Computer Society",

pages = "4195--4199",

booktitle = "2016 IEEE International Conference on Image Processing, ICIP 2016 - Proceedings",

}

TY - GEN

T1 - Recognizing unseen actions in a domain-adapted embedding space

AU - Li, Yikang

AU - Hu, Sheng Hung

AU - Li, Baoxin

PY - 2016/8/3

Y1 - 2016/8/3

N2 - With the sustaining bloom of multimedia data, Zero-shot Learning (ZSL) techniques have attracted much attention in recent years for its ability to train learning models that can handle 'unseen' categories. Existing ZSL algorithms mainly take advantages of attribute-based semantic space and only focus on static image data. Besides, most ZSL studies merely consider the semantic embedded labels and fail to address domain shift problem. In this paper, we purpose a deep two-output model for video ZSL and action recognition tasks by computing both spatial and temporal features from video contents through distinct Convolutional Neural Networks (CNNs) and training a Multi-layer Perceptron (MLP) upon extracted features to map videos to semantic embedding word vectors. Moreover, we introduce a domain adaptation strategy named 'ConSSEV' - by combining outputs from two distinct output layers of our MLP to improve the results of zero-shot learning. Our experiments on UCF101 dataset demonstrate the purposed model has more advantages associated with more complex video embedding schemes, and outperforms the state-of-the-art zero-shot learning techniques.

AB - With the sustaining bloom of multimedia data, Zero-shot Learning (ZSL) techniques have attracted much attention in recent years for its ability to train learning models that can handle 'unseen' categories. Existing ZSL algorithms mainly take advantages of attribute-based semantic space and only focus on static image data. Besides, most ZSL studies merely consider the semantic embedded labels and fail to address domain shift problem. In this paper, we purpose a deep two-output model for video ZSL and action recognition tasks by computing both spatial and temporal features from video contents through distinct Convolutional Neural Networks (CNNs) and training a Multi-layer Perceptron (MLP) upon extracted features to map videos to semantic embedding word vectors. Moreover, we introduce a domain adaptation strategy named 'ConSSEV' - by combining outputs from two distinct output layers of our MLP to improve the results of zero-shot learning. Our experiments on UCF101 dataset demonstrate the purposed model has more advantages associated with more complex video embedding schemes, and outperforms the state-of-the-art zero-shot learning techniques.

KW - Action recognition

KW - Convolutional neural network

KW - Multi-layer perceptron

KW - Zero-shot learning

UR - http://www.scopus.com/inward/record.url?scp=85006826677&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85006826677&partnerID=8YFLogxK

U2 - 10.1109/ICIP.2016.7533150

DO - 10.1109/ICIP.2016.7533150

M3 - Conference contribution

AN - SCOPUS:85006826677

T3 - Proceedings - International Conference on Image Processing, ICIP

SP - 4195

EP - 4199

BT - 2016 IEEE International Conference on Image Processing, ICIP 2016 - Proceedings

PB - IEEE Computer Society

T2 - 23rd IEEE International Conference on Image Processing, ICIP 2016

Y2 - 25 September 2016 through 28 September 2016

ER -

Recognizing unseen actions in a domain-adapted embedding space

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this