Robots with language: Multi-label visual recognition using NLP

Yezhou Yang; Ching L. Teo; Cornelia Fermuller; Yiannis Aloimonos

doi:10.1109/ICRA.2013.6631179

Robots with language: Multi-label visual recognition using NLP

Yezhou Yang, Ching L. Teo, Cornelia Fermuller, Yiannis Aloimonos

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

There has been a recent interest in utilizing contextual knowledge to improve multi-label visual recognition for intelligent agents like robots. Natural Language Processing (NLP) can give us labels, the correlation of labels, and the ontological knowledge about them, so we can automate the acquisition of contextual knowledge. In this paper we show how to use tools from NLP in conjunction with Vision to improve visual recognition. There are two major approaches: First, different language databases organize words according to various semantic concepts. Using these, we can build special purpose databases that can predict the labels involved given a certain context. Here we build a knowledge base for the purpose of describing common daily activities. Second, statistical language tools can provide the correlations of different labels. We show a way to learn a language model from large corpus data that exploits these correlations and propose a general optimization scheme to integrate the language model into the system. Experiments conducted on three multi-label everyday recognition tasks support the effectiveness and efficiency of our approach, with significant gains in recognition accuracies when correlation information is used.

Original language	English (US)
Title of host publication	2013 IEEE International Conference on Robotics and Automation, ICRA 2013
Pages	4256-4262
Number of pages	7
DOIs	https://doi.org/10.1109/ICRA.2013.6631179
State	Published - 2013
Externally published	Yes
Event	2013 IEEE International Conference on Robotics and Automation, ICRA 2013 - Karlsruhe, Germany Duration: May 6 2013 → May 10 2013

Publication series

Name	Proceedings - IEEE International Conference on Robotics and Automation
ISSN (Print)	1050-4729

Other

Other	2013 IEEE International Conference on Robotics and Automation, ICRA 2013
Country/Territory	Germany
City	Karlsruhe
Period	5/6/13 → 5/10/13

ASJC Scopus subject areas

Software
Artificial Intelligence
Electrical and Electronic Engineering
Control and Systems Engineering

Access to Document

10.1109/ICRA.2013.6631179

Cite this

Robots with language: Multi-label visual recognition using NLP. / Yang, Yezhou; Teo, Ching L.; Fermuller, Cornelia et al.
2013 IEEE International Conference on Robotics and Automation, ICRA 2013. 2013. p. 4256-4262 6631179 (Proceedings - IEEE International Conference on Robotics and Automation).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yang, Y, Teo, CL, Fermuller, C & Aloimonos, Y 2013, Robots with language: Multi-label visual recognition using NLP. in 2013 IEEE International Conference on Robotics and Automation, ICRA 2013., 6631179, Proceedings - IEEE International Conference on Robotics and Automation, pp. 4256-4262, 2013 IEEE International Conference on Robotics and Automation, ICRA 2013, Karlsruhe, Germany, 5/6/13. https://doi.org/10.1109/ICRA.2013.6631179

@inproceedings{10c71b3f772c4e6891473c8f4b09b4ce,

title = "Robots with language: Multi-label visual recognition using NLP",

abstract = "There has been a recent interest in utilizing contextual knowledge to improve multi-label visual recognition for intelligent agents like robots. Natural Language Processing (NLP) can give us labels, the correlation of labels, and the ontological knowledge about them, so we can automate the acquisition of contextual knowledge. In this paper we show how to use tools from NLP in conjunction with Vision to improve visual recognition. There are two major approaches: First, different language databases organize words according to various semantic concepts. Using these, we can build special purpose databases that can predict the labels involved given a certain context. Here we build a knowledge base for the purpose of describing common daily activities. Second, statistical language tools can provide the correlations of different labels. We show a way to learn a language model from large corpus data that exploits these correlations and propose a general optimization scheme to integrate the language model into the system. Experiments conducted on three multi-label everyday recognition tasks support the effectiveness and efficiency of our approach, with significant gains in recognition accuracies when correlation information is used.",

author = "Yezhou Yang and Teo, {Ching L.} and Cornelia Fermuller and Yiannis Aloimonos",

year = "2013",

doi = "10.1109/ICRA.2013.6631179",

language = "English (US)",

isbn = "9781467356411",

series = "Proceedings - IEEE International Conference on Robotics and Automation",

pages = "4256--4262",

booktitle = "2013 IEEE International Conference on Robotics and Automation, ICRA 2013",

note = "2013 IEEE International Conference on Robotics and Automation, ICRA 2013 ; Conference date: 06-05-2013 Through 10-05-2013",

}

TY - GEN

T1 - Robots with language

T2 - 2013 IEEE International Conference on Robotics and Automation, ICRA 2013

AU - Yang, Yezhou

AU - Teo, Ching L.

AU - Fermuller, Cornelia

AU - Aloimonos, Yiannis

PY - 2013

Y1 - 2013

N2 - There has been a recent interest in utilizing contextual knowledge to improve multi-label visual recognition for intelligent agents like robots. Natural Language Processing (NLP) can give us labels, the correlation of labels, and the ontological knowledge about them, so we can automate the acquisition of contextual knowledge. In this paper we show how to use tools from NLP in conjunction with Vision to improve visual recognition. There are two major approaches: First, different language databases organize words according to various semantic concepts. Using these, we can build special purpose databases that can predict the labels involved given a certain context. Here we build a knowledge base for the purpose of describing common daily activities. Second, statistical language tools can provide the correlations of different labels. We show a way to learn a language model from large corpus data that exploits these correlations and propose a general optimization scheme to integrate the language model into the system. Experiments conducted on three multi-label everyday recognition tasks support the effectiveness and efficiency of our approach, with significant gains in recognition accuracies when correlation information is used.

AB - There has been a recent interest in utilizing contextual knowledge to improve multi-label visual recognition for intelligent agents like robots. Natural Language Processing (NLP) can give us labels, the correlation of labels, and the ontological knowledge about them, so we can automate the acquisition of contextual knowledge. In this paper we show how to use tools from NLP in conjunction with Vision to improve visual recognition. There are two major approaches: First, different language databases organize words according to various semantic concepts. Using these, we can build special purpose databases that can predict the labels involved given a certain context. Here we build a knowledge base for the purpose of describing common daily activities. Second, statistical language tools can provide the correlations of different labels. We show a way to learn a language model from large corpus data that exploits these correlations and propose a general optimization scheme to integrate the language model into the system. Experiments conducted on three multi-label everyday recognition tasks support the effectiveness and efficiency of our approach, with significant gains in recognition accuracies when correlation information is used.

UR - http://www.scopus.com/inward/record.url?scp=84887289565&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887289565&partnerID=8YFLogxK

U2 - 10.1109/ICRA.2013.6631179

DO - 10.1109/ICRA.2013.6631179

M3 - Conference contribution

AN - SCOPUS:84887289565

SN - 9781467356411

T3 - Proceedings - IEEE International Conference on Robotics and Automation

SP - 4256

EP - 4262

BT - 2013 IEEE International Conference on Robotics and Automation, ICRA 2013

Y2 - 6 May 2013 through 10 May 2013

ER -

Robots with language: Multi-label visual recognition using NLP

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this