Learning the spatial semantics of manipulation actions through preposition grounding

Konstantinos Zampogiannis; Yezhou Yang; Cornelia Fermuller; Yiannis Aloimonos

doi:10.1109/ICRA.2015.7139371

Learning the spatial semantics of manipulation actions through preposition grounding

Konstantinos Zampogiannis, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

Research output: Contribution to journal › Conference article › peer-review

44 Scopus citations

Abstract

In this paper, we introduce an abstract representation for manipulation actions that is based on the evolution of the spatial relations between involved objects. Object tracking in RGBD streams enables straightforward and intuitive ways to model spatial relations in 3D space. Reasoning in 3D overcomes many of the limitations of similar previous approaches, while providing significant flexibility in the desired level of abstraction. At each frame of a manipulation video, we evaluate a number of spatial predicates for all object pairs and treat the resulting set of sequences (Predicate Vector Sequences, PVS) as an action descriptor. As part of our representation, we introduce a symmetric, time-normalized pairwise distance measure that relies on finding an optimal object correspondence between two actions. We experimentally evaluate the method on the classification of various manipulation actions in video, performed at different speeds and timings and involving different objects. The results demonstrate that the proposed representation is remarkably descriptive of the high-level manipulation semantics.

Original language	English (US)
Article number	7139371
Pages (from-to)	1389-1396
Number of pages	8
Journal	Proceedings - IEEE International Conference on Robotics and Automation
Volume	2015-June
Issue number	June
DOIs	https://doi.org/10.1109/ICRA.2015.7139371
State	Published - Jun 29 2015
Externally published	Yes
Event	2015 IEEE International Conference on Robotics and Automation, ICRA 2015 - Seattle, United States Duration: May 26 2015 → May 30 2015

ASJC Scopus subject areas

Software
Control and Systems Engineering
Artificial Intelligence
Electrical and Electronic Engineering

Access to Document

10.1109/ICRA.2015.7139371

Cite this

Learning the spatial semantics of manipulation actions through preposition grounding. / Zampogiannis, Konstantinos; Yang, Yezhou; Fermuller, Cornelia et al.
In: Proceedings - IEEE International Conference on Robotics and Automation, Vol. 2015-June, No. June, 7139371, 29.06.2015, p. 1389-1396.

Research output: Contribution to journal › Conference article › peer-review

@article{929026e231564809b9431747a2b344ac,

title = "Learning the spatial semantics of manipulation actions through preposition grounding",

abstract = "In this paper, we introduce an abstract representation for manipulation actions that is based on the evolution of the spatial relations between involved objects. Object tracking in RGBD streams enables straightforward and intuitive ways to model spatial relations in 3D space. Reasoning in 3D overcomes many of the limitations of similar previous approaches, while providing significant flexibility in the desired level of abstraction. At each frame of a manipulation video, we evaluate a number of spatial predicates for all object pairs and treat the resulting set of sequences (Predicate Vector Sequences, PVS) as an action descriptor. As part of our representation, we introduce a symmetric, time-normalized pairwise distance measure that relies on finding an optimal object correspondence between two actions. We experimentally evaluate the method on the classification of various manipulation actions in video, performed at different speeds and timings and involving different objects. The results demonstrate that the proposed representation is remarkably descriptive of the high-level manipulation semantics.",

author = "Konstantinos Zampogiannis and Yezhou Yang and Cornelia Fermuller and Yiannis Aloimonos",

year = "2015",

month = jun,

day = "29",

doi = "10.1109/ICRA.2015.7139371",

language = "English (US)",

volume = "2015-June",

pages = "1389--1396",

journal = "Proceedings - IEEE International Conference on Robotics and Automation",

issn = "1050-4729",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "June",

note = "2015 IEEE International Conference on Robotics and Automation, ICRA 2015 ; Conference date: 26-05-2015 Through 30-05-2015",

}

TY - JOUR

T1 - Learning the spatial semantics of manipulation actions through preposition grounding

AU - Zampogiannis, Konstantinos

AU - Yang, Yezhou

AU - Fermuller, Cornelia

AU - Aloimonos, Yiannis

PY - 2015/6/29

Y1 - 2015/6/29

N2 - In this paper, we introduce an abstract representation for manipulation actions that is based on the evolution of the spatial relations between involved objects. Object tracking in RGBD streams enables straightforward and intuitive ways to model spatial relations in 3D space. Reasoning in 3D overcomes many of the limitations of similar previous approaches, while providing significant flexibility in the desired level of abstraction. At each frame of a manipulation video, we evaluate a number of spatial predicates for all object pairs and treat the resulting set of sequences (Predicate Vector Sequences, PVS) as an action descriptor. As part of our representation, we introduce a symmetric, time-normalized pairwise distance measure that relies on finding an optimal object correspondence between two actions. We experimentally evaluate the method on the classification of various manipulation actions in video, performed at different speeds and timings and involving different objects. The results demonstrate that the proposed representation is remarkably descriptive of the high-level manipulation semantics.

AB - In this paper, we introduce an abstract representation for manipulation actions that is based on the evolution of the spatial relations between involved objects. Object tracking in RGBD streams enables straightforward and intuitive ways to model spatial relations in 3D space. Reasoning in 3D overcomes many of the limitations of similar previous approaches, while providing significant flexibility in the desired level of abstraction. At each frame of a manipulation video, we evaluate a number of spatial predicates for all object pairs and treat the resulting set of sequences (Predicate Vector Sequences, PVS) as an action descriptor. As part of our representation, we introduce a symmetric, time-normalized pairwise distance measure that relies on finding an optimal object correspondence between two actions. We experimentally evaluate the method on the classification of various manipulation actions in video, performed at different speeds and timings and involving different objects. The results demonstrate that the proposed representation is remarkably descriptive of the high-level manipulation semantics.

UR - http://www.scopus.com/inward/record.url?scp=84938265517&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938265517&partnerID=8YFLogxK

U2 - 10.1109/ICRA.2015.7139371

DO - 10.1109/ICRA.2015.7139371

M3 - Conference article

AN - SCOPUS:84938265517

SN - 1050-4729

VL - 2015-June

SP - 1389

EP - 1396

JO - Proceedings - IEEE International Conference on Robotics and Automation

JF - Proceedings - IEEE International Conference on Robotics and Automation

IS - June

M1 - 7139371

T2 - 2015 IEEE International Conference on Robotics and Automation, ICRA 2015

Y2 - 26 May 2015 through 30 May 2015

ER -

Learning the spatial semantics of manipulation actions through preposition grounding

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this