A corpus-guided framework for robotic visual perception

Ching L. Teo, Yezhou Yang, Hal Daumé, Cornelia Fermüller, Yiannis Aloimonos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.

Original languageEnglish (US)
Title of host publicationLanguage-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report
Pages36-42
Number of pages7
VolumeWS-11-14
StatePublished - 2011
Externally publishedYes
Event2011 AAAI Workshop - San Francisco, CA, United States
Duration: Aug 7 2011Aug 8 2011

Other

Other2011 AAAI Workshop
CountryUnited States
CitySan Francisco, CA
Period8/7/118/8/11

Fingerprint

Robotics
Robots
Optical flows
Object detection

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Teo, C. L., Yang, Y., Daumé, H., Fermüller, C., & Aloimonos, Y. (2011). A corpus-guided framework for robotic visual perception. In Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report (Vol. WS-11-14, pp. 36-42)

A corpus-guided framework for robotic visual perception. / Teo, Ching L.; Yang, Yezhou; Daumé, Hal; Fermüller, Cornelia; Aloimonos, Yiannis.

Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report. Vol. WS-11-14 2011. p. 36-42.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Teo, CL, Yang, Y, Daumé, H, Fermüller, C & Aloimonos, Y 2011, A corpus-guided framework for robotic visual perception. in Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report. vol. WS-11-14, pp. 36-42, 2011 AAAI Workshop, San Francisco, CA, United States, 8/7/11.
Teo CL, Yang Y, Daumé H, Fermüller C, Aloimonos Y. A corpus-guided framework for robotic visual perception. In Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report. Vol. WS-11-14. 2011. p. 36-42
Teo, Ching L. ; Yang, Yezhou ; Daumé, Hal ; Fermüller, Cornelia ; Aloimonos, Yiannis. / A corpus-guided framework for robotic visual perception. Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report. Vol. WS-11-14 2011. pp. 36-42
@inproceedings{256dbbfcd1e34c16b2b26b4bceac2a1c,
title = "A corpus-guided framework for robotic visual perception",
abstract = "We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.",
author = "Teo, {Ching L.} and Yezhou Yang and Hal Daum{\'e} and Cornelia Ferm{\"u}ller and Yiannis Aloimonos",
year = "2011",
language = "English (US)",
isbn = "9781577355304",
volume = "WS-11-14",
pages = "36--42",
booktitle = "Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report",

}

TY - GEN

T1 - A corpus-guided framework for robotic visual perception

AU - Teo, Ching L.

AU - Yang, Yezhou

AU - Daumé, Hal

AU - Fermüller, Cornelia

AU - Aloimonos, Yiannis

PY - 2011

Y1 - 2011

N2 - We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.

AB - We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.

UR - http://www.scopus.com/inward/record.url?scp=80055059199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80055059199&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781577355304

VL - WS-11-14

SP - 36

EP - 42

BT - Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report

ER -