Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

Man Luo; Yankai Zeng; Pratyay Banerjee; Chitta Baral

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

18 Scopus citations

Abstract

Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OKVQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, we collect a natural language knowledge base that can be used for any VQA system. Moreover, we propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. We introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction. Both the retriever and reader are trained with weak supervision. Our experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge. The code and corpus are provided in this link.

Original language	English (US)
Title of host publication	EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
Publisher	Association for Computational Linguistics (ACL)
Pages	6417-6431
Number of pages	15
ISBN (Electronic)	9781955917094
State	Published - 2021
Event	2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Virtual, Punta Cana, Dominican Republic Duration: Nov 7 2021 → Nov 11 2021

Publication series

Name	EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

Conference

Conference	2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/Territory	Dominican Republic
City	Virtual, Punta Cana
Period	11/7/21 → 11/11/21

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Science Applications
Information Systems

Cite this

Luo, M., Zeng, Y., Banerjee, P., & Baral, C. (2021). Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 6417-6431). (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings). Association for Computational Linguistics (ACL).

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering. / Luo, Man; Zeng, Yankai; Banerjee, Pratyay et al.
EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. Association for Computational Linguistics (ACL), 2021. p. 6417-6431 (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Luo, M, Zeng, Y, Banerjee, P & Baral, C 2021, Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering. in EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, Association for Computational Linguistics (ACL), pp. 6417-6431, 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual, Punta Cana, Dominican Republic, 11/7/21.

Luo M, Zeng Y, Banerjee P, Baral C. Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. Association for Computational Linguistics (ACL). 2021. p. 6417-6431. (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings).

Luo, Man ; Zeng, Yankai ; Banerjee, Pratyay et al. / Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering. EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings. Association for Computational Linguistics (ACL), 2021. pp. 6417-6431 (EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings).

@inproceedings{e83ae06acf5c4182bbf9346f7c2055bd,

title = "Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering",

abstract = "Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OKVQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, we collect a natural language knowledge base that can be used for any VQA system. Moreover, we propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. We introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction. Both the retriever and reader are trained with weak supervision. Our experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge. The code and corpus are provided in this link.",

author = "Man Luo and Yankai Zeng and Pratyay Banerjee and Chitta Baral",

note = "Funding Information: The authors acknowledge support from the NSF grant 1816039, DARPA grant W911NF2020006, DARPA grant FA875019C0003, and ONR award N00014-20-1-2332; and thank the reviewers for their feedback. Publisher Copyright: {\textcopyright} 2021 Association for Computational Linguistics; 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 ; Conference date: 07-11-2021 Through 11-11-2021",

year = "2021",

language = "English (US)",

series = "EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings",

publisher = "Association for Computational Linguistics (ACL)",

pages = "6417--6431",

booktitle = "EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings",

}

TY - GEN

T1 - Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

AU - Luo, Man

AU - Zeng, Yankai

AU - Banerjee, Pratyay

AU - Baral, Chitta

N1 - Funding Information: The authors acknowledge support from the NSF grant 1816039, DARPA grant W911NF2020006, DARPA grant FA875019C0003, and ONR award N00014-20-1-2332; and thank the reviewers for their feedback. Publisher Copyright: © 2021 Association for Computational Linguistics

PY - 2021

Y1 - 2021

N2 - Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OKVQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, we collect a natural language knowledge base that can be used for any VQA system. Moreover, we propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. We introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction. Both the retriever and reader are trained with weak supervision. Our experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge. The code and corpus are provided in this link.

AB - Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OKVQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, we collect a natural language knowledge base that can be used for any VQA system. Moreover, we propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. We introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction. Both the retriever and reader are trained with weak supervision. Our experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge. The code and corpus are provided in this link.

UR - http://www.scopus.com/inward/record.url?scp=85127383139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85127383139&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85127383139

T3 - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

SP - 6417

EP - 6431

BT - EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings

PB - Association for Computational Linguistics (ACL)

T2 - 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021

Y2 - 7 November 2021 through 11 November 2021

ER -

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this