A corpus-guided framework for robotic visual perception

Ching L. Teo; Yezhou Yang; Hal Daumé; Cornelia Fermüller; Yiannis Aloimonos

A corpus-guided framework for robotic visual perception

Ching L. Teo, Yezhou Yang, Hal Daumé, Cornelia Fermüller, Yiannis Aloimonos

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.

Original language	English (US)
Title of host publication	Language-Action Tools for Cognitive Artificial Agents
Subtitle of host publication	Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report
Pages	36-42
Number of pages	7
State	Published - 2011
Externally published	Yes
Event	2011 AAAI Workshop - San Francisco, CA, United States Duration: Aug 7 2011 → Aug 8 2011

Publication series

Name	AAAI Workshop - Technical Report
Volume	WS-11-14

Other

Other	2011 AAAI Workshop
Country/Territory	United States
City	San Francisco, CA
Period	8/7/11 → 8/8/11

ASJC Scopus subject areas

General Engineering

Cite this

A corpus-guided framework for robotic visual perception. / Teo, Ching L.; Yang, Yezhou; Daumé, Hal et al.
Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report. 2011. p. 36-42 (AAAI Workshop - Technical Report; Vol. WS-11-14).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Teo, CL, Yang, Y, Daumé, H, Fermüller, C & Aloimonos, Y 2011, A corpus-guided framework for robotic visual perception. in Language-Action Tools for Cognitive Artificial Agents: Integrating Vision, Action and Language - Papers from the 2011 AAAI Workshop, Technical Report. AAAI Workshop - Technical Report, vol. WS-11-14, pp. 36-42, 2011 AAAI Workshop, San Francisco, CA, United States, 8/7/11.

@inproceedings{256dbbfcd1e34c16b2b26b4bceac2a1c,

title = "A corpus-guided framework for robotic visual perception",

abstract = "We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.",

author = "Teo, {Ching L.} and Yezhou Yang and Hal Daum{\'e} and Cornelia Ferm{\"u}ller and Yiannis Aloimonos",

year = "2011",

language = "English (US)",

isbn = "9781577355304",

series = "AAAI Workshop - Technical Report",

pages = "36--42",

booktitle = "Language-Action Tools for Cognitive Artificial Agents",

note = "2011 AAAI Workshop ; Conference date: 07-08-2011 Through 08-08-2011",

}

TY - GEN

T1 - A corpus-guided framework for robotic visual perception

AU - Teo, Ching L.

AU - Yang, Yezhou

AU - Daumé, Hal

AU - Fermüller, Cornelia

AU - Aloimonos, Yiannis

PY - 2011

Y1 - 2011

N2 - We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.

AB - We present a framework that produces sentence-level summarizations of videos containing complex human activities that can be implemented as part of the Robot Perception Control Unit (RPCU). This is done via: 1) detection of pertinent objects in the scene: tools and direct-objects, 2) predicting actions guided by a large lexical corpus and 3) generating the most likely sentence description of the video given the detections. We pursue an active object detection approach by focusing on regions of high optical flow. Next, an iterative EM strategy, guided by language, is used to predict the possible actions. Finally, we model the sentence generation process as a HMM optimization problem, combining visual detections and a trained language model to produce a readable description of the video. Experimental results validate our approach and we discuss the implications of our approach to the RPCU in future applications.

UR - http://www.scopus.com/inward/record.url?scp=80055059199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80055059199&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80055059199

SN - 9781577355304

T3 - AAAI Workshop - Technical Report

SP - 36

EP - 42

BT - Language-Action Tools for Cognitive Artificial Agents

T2 - 2011 AAAI Workshop

Y2 - 7 August 2011 through 8 August 2011

ER -

A corpus-guided framework for robotic visual perception

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this