Corpus-guided sentence generation of natural images

Yezhou Yang, Ching Lik Teo, Hal Daumé, Yiannis Aloimonos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

178 Citations (Scopus)

Abstract

We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.

Original languageEnglish (US)
Title of host publicationEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages444-454
Number of pages11
StatePublished - 2011
Externally publishedYes
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom
Duration: Jul 27 2011Jul 31 2011

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2011
CountryUnited Kingdom
CityEdinburgh
Period7/27/117/31/11

Fingerprint

Detectors

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Yang, Y., Teo, C. L., Daumé, H., & Aloimonos, Y. (2011). Corpus-guided sentence generation of natural images. In EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 444-454)

Corpus-guided sentence generation of natural images. / Yang, Yezhou; Teo, Ching Lik; Daumé, Hal; Aloimonos, Yiannis.

EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2011. p. 444-454.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yang, Y, Teo, CL, Daumé, H & Aloimonos, Y 2011, Corpus-guided sentence generation of natural images. in EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. pp. 444-454, Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, United Kingdom, 7/27/11.
Yang Y, Teo CL, Daumé H, Aloimonos Y. Corpus-guided sentence generation of natural images. In EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2011. p. 444-454
Yang, Yezhou ; Teo, Ching Lik ; Daumé, Hal ; Aloimonos, Yiannis. / Corpus-guided sentence generation of natural images. EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2011. pp. 444-454
@inproceedings{5cdbb83516044984bf9138d6fe936bec,
title = "Corpus-guided sentence generation of natural images",
abstract = "We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.",
author = "Yezhou Yang and Teo, {Ching Lik} and Hal Daum{\'e} and Yiannis Aloimonos",
year = "2011",
language = "English (US)",
isbn = "1937284115",
pages = "444--454",
booktitle = "EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

}

TY - GEN

T1 - Corpus-guided sentence generation of natural images

AU - Yang, Yezhou

AU - Teo, Ching Lik

AU - Daumé, Hal

AU - Aloimonos, Yiannis

PY - 2011

Y1 - 2011

N2 - We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.

AB - We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.

UR - http://www.scopus.com/inward/record.url?scp=80053258778&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053258778&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053258778

SN - 1937284115

SN - 9781937284114

SP - 444

EP - 454

BT - EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

ER -