Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

Tejas Gokhale; Swaroop Mishra; Man Luo; Bhavdeep Singh Sachdeva; Chitta Baral

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

Tejas Gokhale, Swaroop Mishra, Man Luo, Bhavdeep Singh Sachdeva, Chitta Baral

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

6 Scopus citations

Abstract

Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inputs, in both natural language processing and computer vision literature. However, the effect of data modification on adversarial robustness remains unclear. In this work, we conduct a comprehensive study of common data modification strategies and evaluate not only their in-domain and OOD performance, but also their adversarial robustness (AR). We also present results on a two-dimensional synthetic dataset to visualize the effect of each method on the training distribution. This work serves as an empirical study towards understanding the relationship between generalizing to unseen domains and defending against adversarial perturbations. Our findings suggest that more data (either via additional datasets or data augmentation) benefits both OOD accuracy and AR. However, data filtering (previously shown to improve OOD accuracy on natural language inference) hurts OOD accuracy on other tasks such as question answering and image classification. We provide insights from our experiments to inform future work in this direction.

Original language	English (US)
Title of host publication	ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022
Editors	Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Publisher	Association for Computational Linguistics (ACL)
Pages	2705-2718
Number of pages	14
ISBN (Electronic)	9781955917254
State	Published - 2022
Event	60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 - Dublin, Ireland Duration: May 22 2022 → May 27 2022

Publication series

Name	Proceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)	0736-587X

Conference

Conference	60th Annual Meeting of the Association for Computational Linguistics, ACL 2022
Country/Territory	Ireland
City	Dublin
Period	5/22/22 → 5/27/22

ASJC Scopus subject areas

Computer Science Applications
Linguistics and Language
Language and Linguistics

Cite this

Gokhale, T., Mishra, S., Luo, M., Sachdeva, B. S., & Baral, C. (2022). Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness. In S. Muresan, P. Nakov, & A. Villavicencio (Eds.), ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022 (pp. 2705-2718). (Proceedings of the Annual Meeting of the Association for Computational Linguistics). Association for Computational Linguistics (ACL).

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness. / Gokhale, Tejas; Mishra, Swaroop; Luo, Man et al.
ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022. ed. / Smaranda Muresan; Preslav Nakov; Aline Villavicencio. Association for Computational Linguistics (ACL), 2022. p. 2705-2718 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gokhale, T, Mishra, S, Luo, M, Sachdeva, BS & Baral, C 2022, Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness. in S Muresan, P Nakov & A Villavicencio (eds), ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022. Proceedings of the Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (ACL), pp. 2705-2718, 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022, Dublin, Ireland, 5/22/22.

Gokhale T, Mishra S, Luo M, Sachdeva BS, Baral C. Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness. In Muresan S, Nakov P, Villavicencio A, editors, ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022. Association for Computational Linguistics (ACL). 2022. p. 2705-2718. (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

Gokhale, Tejas ; Mishra, Swaroop ; Luo, Man et al. / Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness. ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022. editor / Smaranda Muresan ; Preslav Nakov ; Aline Villavicencio. Association for Computational Linguistics (ACL), 2022. pp. 2705-2718 (Proceedings of the Annual Meeting of the Association for Computational Linguistics).

@inproceedings{efe0775c240e4732adfc09f0f5232276,

title = "Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness",

abstract = "Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inputs, in both natural language processing and computer vision literature. However, the effect of data modification on adversarial robustness remains unclear. In this work, we conduct a comprehensive study of common data modification strategies and evaluate not only their in-domain and OOD performance, but also their adversarial robustness (AR). We also present results on a two-dimensional synthetic dataset to visualize the effect of each method on the training distribution. This work serves as an empirical study towards understanding the relationship between generalizing to unseen domains and defending against adversarial perturbations. Our findings suggest that more data (either via additional datasets or data augmentation) benefits both OOD accuracy and AR. However, data filtering (previously shown to improve OOD accuracy on natural language inference) hurts OOD accuracy on other tasks such as question answering and image classification. We provide insights from our experiments to inform future work in this direction.",

author = "Tejas Gokhale and Swaroop Mishra and Man Luo and Sachdeva, {Bhavdeep Singh} and Chitta Baral",

note = "Publisher Copyright: {\textcopyright} 2022 Association for Computational Linguistics.; 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 ; Conference date: 22-05-2022 Through 27-05-2022",

year = "2022",

language = "English (US)",

series = "Proceedings of the Annual Meeting of the Association for Computational Linguistics",

publisher = "Association for Computational Linguistics (ACL)",

pages = "2705--2718",

editor = "Smaranda Muresan and Preslav Nakov and Aline Villavicencio",

booktitle = "ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022",

}

TY - GEN

T1 - Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

AU - Gokhale, Tejas

AU - Mishra, Swaroop

AU - Luo, Man

AU - Sachdeva, Bhavdeep Singh

AU - Baral, Chitta

PY - 2022

Y1 - 2022

N2 - Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inputs, in both natural language processing and computer vision literature. However, the effect of data modification on adversarial robustness remains unclear. In this work, we conduct a comprehensive study of common data modification strategies and evaluate not only their in-domain and OOD performance, but also their adversarial robustness (AR). We also present results on a two-dimensional synthetic dataset to visualize the effect of each method on the training distribution. This work serves as an empirical study towards understanding the relationship between generalizing to unseen domains and defending against adversarial perturbations. Our findings suggest that more data (either via additional datasets or data augmentation) benefits both OOD accuracy and AR. However, data filtering (previously shown to improve OOD accuracy on natural language inference) hurts OOD accuracy on other tasks such as question answering and image classification. We provide insights from our experiments to inform future work in this direction.

AB - Data modification, either via additional training datasets, data augmentation, debiasing, and dataset filtering, has been proposed as an effective solution for generalizing to out-of-domain (OOD) inputs, in both natural language processing and computer vision literature. However, the effect of data modification on adversarial robustness remains unclear. In this work, we conduct a comprehensive study of common data modification strategies and evaluate not only their in-domain and OOD performance, but also their adversarial robustness (AR). We also present results on a two-dimensional synthetic dataset to visualize the effect of each method on the training distribution. This work serves as an empirical study towards understanding the relationship between generalizing to unseen domains and defending against adversarial perturbations. Our findings suggest that more data (either via additional datasets or data augmentation) benefits both OOD accuracy and AR. However, data filtering (previously shown to improve OOD accuracy on natural language inference) hurts OOD accuracy on other tasks such as question answering and image classification. We provide insights from our experiments to inform future work in this direction.

UR - http://www.scopus.com/inward/record.url?scp=85134661652&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85134661652&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85134661652

T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics

SP - 2705

EP - 2718

BT - ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, Findings of ACL 2022

A2 - Muresan, Smaranda

A2 - Nakov, Preslav

A2 - Villavicencio, Aline

PB - Association for Computational Linguistics (ACL)

T2 - 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022

Y2 - 22 May 2022 through 27 May 2022

ER -

Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness

Abstract

Publication series

Conference

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this