TY - GEN
T1 - Good, Better, Best
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
AU - Lu, Jiaying
AU - Ye, Xin
AU - Ren, Yi
AU - Yang, Yezhou
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Multiple-choice VQA has drawn increasing attention from researchers and end-users recently. As the demand for automatically constructing large-scale multiple-choice VQA data grows, we introduce a novel task called textual Distractors Generation for VQA (DG-VQA) focusing on generating challenging yet meaningful distractors given the context image, question, and correct answer. The DG-VQA task aims at generating distractors without ground-truth training samples since such resources are rarely available. To tackle the DG-VQA unsupervisedly, we propose GOBBET, a reinforcement learning(RL) based framework that utilizes pre-trained VQA models as an alternative knowledge base to guide the distractor generation process. In GOBBET, a pre-trained VQA model serves as the environment in RL setting to provide feedback for the input multi-modal query, while a neural distractor generator serves as the agent to take actions accordingly. We propose to use existing VQA models' performance degradation as indicators of the quality of generated distractors. On the other hand, we show the utility of generated distractors through data augmentation experiments, since robustness is more and more important when AI models apply to unpredictable open-domain scenarios or security-sensitive applications. We further conduct a manual case study on the factors why distractors generated by GOBBET can fool existing models.
AB - Multiple-choice VQA has drawn increasing attention from researchers and end-users recently. As the demand for automatically constructing large-scale multiple-choice VQA data grows, we introduce a novel task called textual Distractors Generation for VQA (DG-VQA) focusing on generating challenging yet meaningful distractors given the context image, question, and correct answer. The DG-VQA task aims at generating distractors without ground-truth training samples since such resources are rarely available. To tackle the DG-VQA unsupervisedly, we propose GOBBET, a reinforcement learning(RL) based framework that utilizes pre-trained VQA models as an alternative knowledge base to guide the distractor generation process. In GOBBET, a pre-trained VQA model serves as the environment in RL setting to provide feedback for the input multi-modal query, while a neural distractor generator serves as the agent to take actions accordingly. We propose to use existing VQA models' performance degradation as indicators of the quality of generated distractors. On the other hand, we show the utility of generated distractors through data augmentation experiments, since robustness is more and more important when AI models apply to unpredictable open-domain scenarios or security-sensitive applications. We further conduct a manual case study on the factors why distractors generated by GOBBET can fool existing models.
UR - http://www.scopus.com/inward/record.url?scp=85137760866&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137760866&partnerID=8YFLogxK
U2 - 10.1109/CVPRW56347.2022.00539
DO - 10.1109/CVPRW56347.2022.00539
M3 - Conference contribution
AN - SCOPUS:85137760866
T3 - IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
SP - 4917
EP - 4926
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022
PB - IEEE Computer Society
Y2 - 19 June 2022 through 20 June 2022
ER -