Learning the spatial semantics of manipulation actions through preposition grounding

Konstantinos Zampogiannis, Yezhou Yang, Cornelia Fermuller, Yiannis Aloimonos

Research output: Contribution to journalConference articlepeer-review

44 Scopus citations

Abstract

In this paper, we introduce an abstract representation for manipulation actions that is based on the evolution of the spatial relations between involved objects. Object tracking in RGBD streams enables straightforward and intuitive ways to model spatial relations in 3D space. Reasoning in 3D overcomes many of the limitations of similar previous approaches, while providing significant flexibility in the desired level of abstraction. At each frame of a manipulation video, we evaluate a number of spatial predicates for all object pairs and treat the resulting set of sequences (Predicate Vector Sequences, PVS) as an action descriptor. As part of our representation, we introduce a symmetric, time-normalized pairwise distance measure that relies on finding an optimal object correspondence between two actions. We experimentally evaluate the method on the classification of various manipulation actions in video, performed at different speeds and timings and involving different objects. The results demonstrate that the proposed representation is remarkably descriptive of the high-level manipulation semantics.

Original languageEnglish (US)
Article number7139371
Pages (from-to)1389-1396
Number of pages8
JournalProceedings - IEEE International Conference on Robotics and Automation
Volume2015-June
Issue numberJune
DOIs
StatePublished - Jun 29 2015
Externally publishedYes
Event2015 IEEE International Conference on Robotics and Automation, ICRA 2015 - Seattle, United States
Duration: May 26 2015May 30 2015

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Artificial Intelligence
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Learning the spatial semantics of manipulation actions through preposition grounding'. Together they form a unique fingerprint.

Cite this