Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning

Jiaying Lu; Xin Ye; Yi Ren; Yezhou Yang

doi:10.1109/CVPRW56347.2022.00539

Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning

Jiaying Lu, Xin Ye, Yi Ren, Yezhou Yang

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

5 Scopus citations

Abstract

Multiple-choice VQA has drawn increasing attention from researchers and end-users recently. As the demand for automatically constructing large-scale multiple-choice VQA data grows, we introduce a novel task called textual Distractors Generation for VQA (DG-VQA) focusing on generating challenging yet meaningful distractors given the context image, question, and correct answer. The DG-VQA task aims at generating distractors without ground-truth training samples since such resources are rarely available. To tackle the DG-VQA unsupervisedly, we propose GOBBET, a reinforcement learning(RL) based framework that utilizes pre-trained VQA models as an alternative knowledge base to guide the distractor generation process. In GOBBET, a pre-trained VQA model serves as the environment in RL setting to provide feedback for the input multi-modal query, while a neural distractor generator serves as the agent to take actions accordingly. We propose to use existing VQA models' performance degradation as indicators of the quality of generated distractors. On the other hand, we show the utility of generated distractors through data augmentation experiments, since robustness is more and more important when AI models apply to unpredictable open-domain scenarios or security-sensitive applications. We further conduct a manual case study on the factors why distractors generated by GOBBET can fool existing models.

Original language	English (US)
Title of host publication	Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
Publisher	IEEE Computer Society
Pages	4917-4926
Number of pages	10
ISBN (Electronic)	9781665487399
DOIs	https://doi.org/10.1109/CVPRW56347.2022.00539
State	Published - 2022
Event	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022 - New Orleans, United States Duration: Jun 19 2022 → Jun 20 2022

Publication series

Name	IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume	2022-June
ISSN (Print)	2160-7508
ISSN (Electronic)	2160-7516

Conference

Conference	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
Country/Territory	United States
City	New Orleans
Period	6/19/22 → 6/20/22

ASJC Scopus subject areas

Computer Vision and Pattern Recognition
Electrical and Electronic Engineering

Access to Document

10.1109/CVPRW56347.2022.00539

Cite this

Lu, J., Ye, X., Ren, Y., & Yang, Y. (2022). Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022 (pp. 4917-4926). (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2022-June). IEEE Computer Society. https://doi.org/10.1109/CVPRW56347.2022.00539

Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning. / Lu, Jiaying; Ye, Xin; Ren, Yi et al.
Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022. IEEE Computer Society, 2022. p. 4917-4926 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops; Vol. 2022-June).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Lu, J, Ye, X, Ren, Y & Yang, Y 2022, Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning. in Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vol. 2022-June, IEEE Computer Society, pp. 4917-4926, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022, New Orleans, United States, 6/19/22. https://doi.org/10.1109/CVPRW56347.2022.00539

Lu J, Ye X, Ren Y , Yang Y. Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning. In Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022. IEEE Computer Society. 2022. p. 4917-4926. (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops). doi: 10.1109/CVPRW56347.2022.00539

Lu, Jiaying ; Ye, Xin ; Ren, Yi et al. / Good, Better, Best : Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning. Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022. IEEE Computer Society, 2022. pp. 4917-4926 (IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops).

@inproceedings{4e08b53813f64ceb8f70afc070533f5c,

title = "Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning",

abstract = "Multiple-choice VQA has drawn increasing attention from researchers and end-users recently. As the demand for automatically constructing large-scale multiple-choice VQA data grows, we introduce a novel task called textual Distractors Generation for VQA (DG-VQA) focusing on generating challenging yet meaningful distractors given the context image, question, and correct answer. The DG-VQA task aims at generating distractors without ground-truth training samples since such resources are rarely available. To tackle the DG-VQA unsupervisedly, we propose GOBBET, a reinforcement learning(RL) based framework that utilizes pre-trained VQA models as an alternative knowledge base to guide the distractor generation process. In GOBBET, a pre-trained VQA model serves as the environment in RL setting to provide feedback for the input multi-modal query, while a neural distractor generator serves as the agent to take actions accordingly. We propose to use existing VQA models' performance degradation as indicators of the quality of generated distractors. On the other hand, we show the utility of generated distractors through data augmentation experiments, since robustness is more and more important when AI models apply to unpredictable open-domain scenarios or security-sensitive applications. We further conduct a manual case study on the factors why distractors generated by GOBBET can fool existing models.",

author = "Jiaying Lu and Xin Ye and Yi Ren and Yezhou Yang",

note = "Publisher Copyright: {\textcopyright} 2022 IEEE.; 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022 ; Conference date: 19-06-2022 Through 20-06-2022",

year = "2022",

doi = "10.1109/CVPRW56347.2022.00539",

language = "English (US)",

series = "IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops",

publisher = "IEEE Computer Society",

pages = "4917--4926",

booktitle = "Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022",

}

TY - GEN

T1 - Good, Better, Best

T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022

AU - Lu, Jiaying

AU - Ye, Xin

AU - Ren, Yi

AU - Yang, Yezhou

PY - 2022

Y1 - 2022

N2 - Multiple-choice VQA has drawn increasing attention from researchers and end-users recently. As the demand for automatically constructing large-scale multiple-choice VQA data grows, we introduce a novel task called textual Distractors Generation for VQA (DG-VQA) focusing on generating challenging yet meaningful distractors given the context image, question, and correct answer. The DG-VQA task aims at generating distractors without ground-truth training samples since such resources are rarely available. To tackle the DG-VQA unsupervisedly, we propose GOBBET, a reinforcement learning(RL) based framework that utilizes pre-trained VQA models as an alternative knowledge base to guide the distractor generation process. In GOBBET, a pre-trained VQA model serves as the environment in RL setting to provide feedback for the input multi-modal query, while a neural distractor generator serves as the agent to take actions accordingly. We propose to use existing VQA models' performance degradation as indicators of the quality of generated distractors. On the other hand, we show the utility of generated distractors through data augmentation experiments, since robustness is more and more important when AI models apply to unpredictable open-domain scenarios or security-sensitive applications. We further conduct a manual case study on the factors why distractors generated by GOBBET can fool existing models.

AB - Multiple-choice VQA has drawn increasing attention from researchers and end-users recently. As the demand for automatically constructing large-scale multiple-choice VQA data grows, we introduce a novel task called textual Distractors Generation for VQA (DG-VQA) focusing on generating challenging yet meaningful distractors given the context image, question, and correct answer. The DG-VQA task aims at generating distractors without ground-truth training samples since such resources are rarely available. To tackle the DG-VQA unsupervisedly, we propose GOBBET, a reinforcement learning(RL) based framework that utilizes pre-trained VQA models as an alternative knowledge base to guide the distractor generation process. In GOBBET, a pre-trained VQA model serves as the environment in RL setting to provide feedback for the input multi-modal query, while a neural distractor generator serves as the agent to take actions accordingly. We propose to use existing VQA models' performance degradation as indicators of the quality of generated distractors. On the other hand, we show the utility of generated distractors through data augmentation experiments, since robustness is more and more important when AI models apply to unpredictable open-domain scenarios or security-sensitive applications. We further conduct a manual case study on the factors why distractors generated by GOBBET can fool existing models.

UR - http://www.scopus.com/inward/record.url?scp=85137760866&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85137760866&partnerID=8YFLogxK

U2 - 10.1109/CVPRW56347.2022.00539

DO - 10.1109/CVPRW56347.2022.00539

M3 - Conference contribution

AN - SCOPUS:85137760866

T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

SP - 4917

EP - 4926

BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022

PB - IEEE Computer Society

Y2 - 19 June 2022 through 20 June 2022

ER -

Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this