A corpus-guided framework for robotic visual perception

Ching L. Teo, Yezhou Yang, Hal Daumé, Cornelia Fermüller, Yiannis Aloimonos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations


We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.

Original languageEnglish (US)
Title of host publicationLanguage-Action Tools for Cognitive Artificial Agents
Subtitle of host publicationIntegrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report
Number of pages7
StatePublished - 2011
Externally publishedYes
Event2011 AAAI Workshop - San Francisco, CA, United States
Duration: Aug 7 2011Aug 8 2011

Publication series

NameAAAI Workshop - Technical Report


Other2011 AAAI Workshop
Country/TerritoryUnited States
CitySan Francisco, CA

ASJC Scopus subject areas

  • Engineering(all)


Dive into the research topics of 'A corpus-guided framework for robotic visual perception'. Together they form a unique fingerprint.

Cite this