Unsupervised Linking of Visual Features to Textual Descriptions in Long Manipulation Activities

Eren Erdal Aksoy, Ekaterina Ovchinnikova, Adil Orhan, Yezhou Yang, Tamim Asfour

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

We present a novel unsupervised framework, which links continuous visual features and symbolic textual descriptions of manipulation activity videos. First, we extract the semantic representation of visually observed manipulations by applying a bottom-up approach to the continuous image streams. We then employ a rule-based reasoning to link visual and linguistic inputs. The proposed framework allows robots 1) to autonomously parse, classify, and label sequentially and/or concurrently performed atomic manipulations (e.g., 'cutting' or 'stirring'), 2) to simultaneously categorize and identify manipulated objects without using any standard feature-based recognition techniques, and 3) to generate textual descriptions for long activities, e.g., 'breakfast preparation.' We evaluated the framework using a dataset of 120 atomic manipulations and 20 long activities.

Original languageEnglish (US)
Article number7856986
Pages (from-to)1397-1404
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume2
Issue number3
DOIs
StatePublished - Jul 1 2017

Fingerprint

Linguistics
Linking
Labels
Manipulation
Semantics
Robots
Rule-based Reasoning
Continuous Image
Bottom-up
Preparation
Robot
Classify
Vision
Framework

Keywords

  • Cognitive human-robot interaction
  • learning and adaptive systems
  • semantic scene understanding

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Human-Computer Interaction
  • Biomedical Engineering
  • Mechanical Engineering
  • Control and Optimization
  • Artificial Intelligence
  • Computer Science Applications
  • Computer Vision and Pattern Recognition

Cite this

Unsupervised Linking of Visual Features to Textual Descriptions in Long Manipulation Activities. / Aksoy, Eren Erdal; Ovchinnikova, Ekaterina; Orhan, Adil; Yang, Yezhou; Asfour, Tamim.

In: IEEE Robotics and Automation Letters, Vol. 2, No. 3, 7856986, 01.07.2017, p. 1397-1404.

Research output: Contribution to journalArticle

Aksoy, Eren Erdal ; Ovchinnikova, Ekaterina ; Orhan, Adil ; Yang, Yezhou ; Asfour, Tamim. / Unsupervised Linking of Visual Features to Textual Descriptions in Long Manipulation Activities. In: IEEE Robotics and Automation Letters. 2017 ; Vol. 2, No. 3. pp. 1397-1404.
@article{be0986d477f2447fab32c86842902f79,
title = "Unsupervised Linking of Visual Features to Textual Descriptions in Long Manipulation Activities",
abstract = "We present a novel unsupervised framework, which links continuous visual features and symbolic textual descriptions of manipulation activity videos. First, we extract the semantic representation of visually observed manipulations by applying a bottom-up approach to the continuous image streams. We then employ a rule-based reasoning to link visual and linguistic inputs. The proposed framework allows robots 1) to autonomously parse, classify, and label sequentially and/or concurrently performed atomic manipulations (e.g., 'cutting' or 'stirring'), 2) to simultaneously categorize and identify manipulated objects without using any standard feature-based recognition techniques, and 3) to generate textual descriptions for long activities, e.g., 'breakfast preparation.' We evaluated the framework using a dataset of 120 atomic manipulations and 20 long activities.",
keywords = "Cognitive human-robot interaction, learning and adaptive systems, semantic scene understanding",
author = "Aksoy, {Eren Erdal} and Ekaterina Ovchinnikova and Adil Orhan and Yezhou Yang and Tamim Asfour",
year = "2017",
month = "7",
day = "1",
doi = "10.1109/LRA.2017.2669363",
language = "English (US)",
volume = "2",
pages = "1397--1404",
journal = "IEEE Robotics and Automation Letters",
issn = "2377-3766",
number = "3",

}

TY - JOUR

T1 - Unsupervised Linking of Visual Features to Textual Descriptions in Long Manipulation Activities

AU - Aksoy, Eren Erdal

AU - Ovchinnikova, Ekaterina

AU - Orhan, Adil

AU - Yang, Yezhou

AU - Asfour, Tamim

PY - 2017/7/1

Y1 - 2017/7/1

N2 - We present a novel unsupervised framework, which links continuous visual features and symbolic textual descriptions of manipulation activity videos. First, we extract the semantic representation of visually observed manipulations by applying a bottom-up approach to the continuous image streams. We then employ a rule-based reasoning to link visual and linguistic inputs. The proposed framework allows robots 1) to autonomously parse, classify, and label sequentially and/or concurrently performed atomic manipulations (e.g., 'cutting' or 'stirring'), 2) to simultaneously categorize and identify manipulated objects without using any standard feature-based recognition techniques, and 3) to generate textual descriptions for long activities, e.g., 'breakfast preparation.' We evaluated the framework using a dataset of 120 atomic manipulations and 20 long activities.

AB - We present a novel unsupervised framework, which links continuous visual features and symbolic textual descriptions of manipulation activity videos. First, we extract the semantic representation of visually observed manipulations by applying a bottom-up approach to the continuous image streams. We then employ a rule-based reasoning to link visual and linguistic inputs. The proposed framework allows robots 1) to autonomously parse, classify, and label sequentially and/or concurrently performed atomic manipulations (e.g., 'cutting' or 'stirring'), 2) to simultaneously categorize and identify manipulated objects without using any standard feature-based recognition techniques, and 3) to generate textual descriptions for long activities, e.g., 'breakfast preparation.' We evaluated the framework using a dataset of 120 atomic manipulations and 20 long activities.

KW - Cognitive human-robot interaction

KW - learning and adaptive systems

KW - semantic scene understanding

UR - http://www.scopus.com/inward/record.url?scp=85044474129&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044474129&partnerID=8YFLogxK

U2 - 10.1109/LRA.2017.2669363

DO - 10.1109/LRA.2017.2669363

M3 - Article

VL - 2

SP - 1397

EP - 1404

JO - IEEE Robotics and Automation Letters

JF - IEEE Robotics and Automation Letters

SN - 2377-3766

IS - 3

M1 - 7856986

ER -