Corpus-guided sentence generation of natural images

Yezhou Yang, Ching Lik Teo, Hal Daumé, Yiannis Aloimonos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

192 Scopus citations

Abstract

We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.

Original languageEnglish (US)
Title of host publicationEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages444-454
Number of pages11
StatePublished - Oct 3 2011
Externally publishedYes
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom
Duration: Jul 27 2011Jul 31 2011

Publication series

NameEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2011
CountryUnited Kingdom
CityEdinburgh
Period7/27/117/31/11

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Fingerprint Dive into the research topics of 'Corpus-guided sentence generation of natural images'. Together they form a unique fingerprint.

  • Cite this

    Yang, Y., Teo, C. L., Daumé, H., & Aloimonos, Y. (2011). Corpus-guided sentence generation of natural images. In EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 444-454). (EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).