Abstract
In this paper, we introduce an abstract representation for manipulation actions that is based on the evolution of the spatial relations between involved objects. Object tracking in RGBD streams enables straightforward and intuitive ways to model spatial relations in 3D space. Reasoning in 3D overcomes many of the limitations of similar previous approaches, while providing significant flexibility in the desired level of abstraction. At each frame of a manipulation video, we evaluate a number of spatial predicates for all object pairs and treat the resulting set of sequences (Predicate Vector Sequences, PVS) as an action descriptor. As part of our representation, we introduce a symmetric, time-normalized pairwise distance measure that relies on finding an optimal object correspondence between two actions. We experimentally evaluate the method on the classification of various manipulation actions in video, performed at different speeds and timings and involving different objects. The results demonstrate that the proposed representation is remarkably descriptive of the high-level manipulation semantics.
Original language | English (US) |
---|---|
Article number | 7139371 |
Pages (from-to) | 1389-1396 |
Number of pages | 8 |
Journal | Proceedings - IEEE International Conference on Robotics and Automation |
Volume | 2015-June |
Issue number | June |
DOIs | |
State | Published - Jun 29 2015 |
Externally published | Yes |
Event | 2015 IEEE International Conference on Robotics and Automation, ICRA 2015 - Seattle, United States Duration: May 26 2015 → May 30 2015 |
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Artificial Intelligence
- Electrical and Electronic Engineering