Temporal transformer networks: Joint learning of invariant and discriminative time warping

Suhas Lohit; Qiao Wang; Pavan Turaga

doi:10.1109/CVPR.2019.01271

Temporal transformer networks: Joint learning of invariant and discriminative time warping

Suhas Lohit, Qiao Wang, Pavan Turaga

Arts, Media and Engineering, School of (AME)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

38 Scopus citations

Abstract

Many time-series classification problems involve developing metrics that are invariant to temporal misalignment. In human activity analysis, temporal misalignment arises due to various reasons including differing initial phase, sensor sampling rates, and elastic time-warps due to subject-specific biomechanics. Past work in this area has only looked at reducing intra-class variability by elastic temporal alignment. In this paper, we propose a hybrid model-based and data-driven approach to learn warping functions that not just reduce intra-class variability, but also increase inter-class separation. We call this a temporal transformer network (TTN). TTN is an interpretable differentiable module, which can be easily integrated at the front end of a classification network. The module is capable of reducing intra-class variance by generating input-dependent warping functions which lead to rate-robust representations. At the same time, it increases inter-class variance by learning warping functions that are more discriminative. We show improvements over strong baselines in 3D action recognition on challenging datasets using the proposed framework. The improvements are especially pronounced when training sets are smaller.

Original language	English (US)
Title of host publication	Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Publisher	IEEE Computer Society
Pages	12418-12427
Number of pages	10
ISBN (Electronic)	9781728132938
DOIs	https://doi.org/10.1109/CVPR.2019.01271
State	Published - Jun 2019
Event	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 - Long Beach, United States Duration: Jun 16 2019 → Jun 20 2019

Publication series

Name	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume	2019-June
ISSN (Print)	1063-6919

Conference

Conference	32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019
Country/Territory	United States
City	Long Beach
Period	6/16/19 → 6/20/19

Keywords

Action Recognition
RGBD sensors and analytics
Representation Learning
Statistical Learning

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition

Access to Document

10.1109/CVPR.2019.01271

Cite this

Lohit, S., Wang, Q., & Turaga, P. (2019). Temporal transformer networks: Joint learning of invariant and discriminative time warping. In Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 (pp. 12418-12427). Article 8954044 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2019-June). IEEE Computer Society. https://doi.org/10.1109/CVPR.2019.01271

Temporal transformer networks: Joint learning of invariant and discriminative time warping. / Lohit, Suhas; Wang, Qiao; Turaga, Pavan.
Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019. IEEE Computer Society, 2019. p. 12418-12427 8954044 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition; Vol. 2019-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Lohit, S, Wang, Q & Turaga, P 2019, Temporal transformer networks: Joint learning of invariant and discriminative time warping. in Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019., 8954044, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2019-June, IEEE Computer Society, pp. 12418-12427, 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, United States, 6/16/19. https://doi.org/10.1109/CVPR.2019.01271

Lohit S, Wang Q, Turaga P. Temporal transformer networks: Joint learning of invariant and discriminative time warping. In Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019. IEEE Computer Society. 2019. p. 12418-12427. 8954044. (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR.2019.01271

Lohit, Suhas ; Wang, Qiao ; Turaga, Pavan. / Temporal transformer networks : Joint learning of invariant and discriminative time warping. Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019. IEEE Computer Society, 2019. pp. 12418-12427 (Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

@inproceedings{a1e26add939a4249a5486c5747a9ed48,

title = "Temporal transformer networks: Joint learning of invariant and discriminative time warping",

abstract = "Many time-series classification problems involve developing metrics that are invariant to temporal misalignment. In human activity analysis, temporal misalignment arises due to various reasons including differing initial phase, sensor sampling rates, and elastic time-warps due to subject-specific biomechanics. Past work in this area has only looked at reducing intra-class variability by elastic temporal alignment. In this paper, we propose a hybrid model-based and data-driven approach to learn warping functions that not just reduce intra-class variability, but also increase inter-class separation. We call this a temporal transformer network (TTN). TTN is an interpretable differentiable module, which can be easily integrated at the front end of a classification network. The module is capable of reducing intra-class variance by generating input-dependent warping functions which lead to rate-robust representations. At the same time, it increases inter-class variance by learning warping functions that are more discriminative. We show improvements over strong baselines in 3D action recognition on challenging datasets using the proposed framework. The improvements are especially pronounced when training sets are smaller.",

keywords = "Action Recognition, RGBD sensors and analytics, Representation Learning, Statistical Learning",

author = "Suhas Lohit and Qiao Wang and Pavan Turaga",

note = "Publisher Copyright: {\textcopyright} 2019 IEEE.; 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019 ; Conference date: 16-06-2019 Through 20-06-2019",

year = "2019",

month = jun,

doi = "10.1109/CVPR.2019.01271",

language = "English (US)",

series = "Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE Computer Society",

pages = "12418--12427",

booktitle = "Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019",

}

TY - GEN

T1 - Temporal transformer networks

T2 - 32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019

AU - Lohit, Suhas

AU - Wang, Qiao

AU - Turaga, Pavan

PY - 2019/6

Y1 - 2019/6

N2 - Many time-series classification problems involve developing metrics that are invariant to temporal misalignment. In human activity analysis, temporal misalignment arises due to various reasons including differing initial phase, sensor sampling rates, and elastic time-warps due to subject-specific biomechanics. Past work in this area has only looked at reducing intra-class variability by elastic temporal alignment. In this paper, we propose a hybrid model-based and data-driven approach to learn warping functions that not just reduce intra-class variability, but also increase inter-class separation. We call this a temporal transformer network (TTN). TTN is an interpretable differentiable module, which can be easily integrated at the front end of a classification network. The module is capable of reducing intra-class variance by generating input-dependent warping functions which lead to rate-robust representations. At the same time, it increases inter-class variance by learning warping functions that are more discriminative. We show improvements over strong baselines in 3D action recognition on challenging datasets using the proposed framework. The improvements are especially pronounced when training sets are smaller.

AB - Many time-series classification problems involve developing metrics that are invariant to temporal misalignment. In human activity analysis, temporal misalignment arises due to various reasons including differing initial phase, sensor sampling rates, and elastic time-warps due to subject-specific biomechanics. Past work in this area has only looked at reducing intra-class variability by elastic temporal alignment. In this paper, we propose a hybrid model-based and data-driven approach to learn warping functions that not just reduce intra-class variability, but also increase inter-class separation. We call this a temporal transformer network (TTN). TTN is an interpretable differentiable module, which can be easily integrated at the front end of a classification network. The module is capable of reducing intra-class variance by generating input-dependent warping functions which lead to rate-robust representations. At the same time, it increases inter-class variance by learning warping functions that are more discriminative. We show improvements over strong baselines in 3D action recognition on challenging datasets using the proposed framework. The improvements are especially pronounced when training sets are smaller.

KW - Action Recognition

KW - RGBD sensors and analytics

KW - Representation Learning

KW - Statistical Learning

UR - http://www.scopus.com/inward/record.url?scp=85078801210&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85078801210&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2019.01271

DO - 10.1109/CVPR.2019.01271

M3 - Conference contribution

AN - SCOPUS:85078801210

T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 12418

EP - 12427

BT - Proceedings - 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2019

PB - IEEE Computer Society

Y2 - 16 June 2019 through 20 June 2019

ER -

Temporal transformer networks: Joint learning of invariant and discriminative time warping

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this