Corpus-guided sentence generation of natural images

Yezhou Yang; Ching Lik Teo; Hal Daumé; Yiannis Aloimonos

Corpus-guided sentence generation of natural images

Yezhou Yang, Ching Lik Teo, Hal Daumé, Yiannis Aloimonos

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.

Original language	English (US)
Title of host publication	EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages	444-454
Number of pages	11
State	Published - 2011
Externally published	Yes
Event	Conference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom Duration: Jul 27 2011 → Jul 31 2011

Publication series

Name	EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

Other

Other	Conference on Empirical Methods in Natural Language Processing, EMNLP 2011
Country/Territory	United Kingdom
City	Edinburgh
Period	7/27/11 → 7/31/11

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Science Applications
Information Systems

Cite this

Corpus-guided sentence generation of natural images. / Yang, Yezhou; Teo, Ching Lik; Daumé, Hal et al.
EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2011. p. 444-454 (EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yang, Y, Teo, CL, Daumé, H & Aloimonos, Y 2011, Corpus-guided sentence generation of natural images. in EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 444-454, Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, United Kingdom, 7/27/11.

@inproceedings{5cdbb83516044984bf9138d6fe936bec,

title = "Corpus-guided sentence generation of natural images",

abstract = "We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.",

author = "Yezhou Yang and Teo, {Ching Lik} and Hal Daum{\'e} and Yiannis Aloimonos",

year = "2011",

language = "English (US)",

isbn = "1937284115",

series = "EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

pages = "444--454",

booktitle = "EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

note = "Conference on Empirical Methods in Natural Language Processing, EMNLP 2011 ; Conference date: 27-07-2011 Through 31-07-2011",

}

TY - GEN

T1 - Corpus-guided sentence generation of natural images

AU - Yang, Yezhou

AU - Teo, Ching Lik

AU - Daumé, Hal

AU - Aloimonos, Yiannis

PY - 2011

Y1 - 2011

N2 - We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.

AB - We propose a sentence generation strategy that describes images by predicting the most likely nouns, verbs, scenes and prepositions that make up the core sentence structure. The input are initial noisy estimates of the objects and scenes detected in the image using state of the art trained detectors. As predicting actions from still images directly is unreliable, we use a language model trained from the English Gigaword corpus to obtain their estimates; together with probabilities of co-located nouns, scenes and prepositions. We use these estimates as parameters on a HMM that models the sentence generation process, with hidden nodes as sentence components and image detections as the emissions. Experimental results show that our strategy of combining vision and language produces readable and descriptive sentences compared to naive strategies that use vision alone.

UR - http://www.scopus.com/inward/record.url?scp=80053258778&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053258778&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053258778

SN - 1937284115

SN - 9781937284114

T3 - EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

SP - 444

EP - 454

BT - EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

T2 - Conference on Empirical Methods in Natural Language Processing, EMNLP 2011

Y2 - 27 July 2011 through 31 July 2011

ER -

Corpus-guided sentence generation of natural images

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this