TY - CONF
T1 - SUPER-NATURALINSTRUCTIONS
T2 - 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
AU - Wang, Yizhong
AU - Mishra, Swaroop
AU - Alipoormolabashi, Pegah
AU - Kordi, Yeganeh
AU - Mirzaei, Amirreza
AU - Arunkumar, Anjana
AU - Ashok, Arjun
AU - Dhanasekaran, Arut Selvan
AU - Naik, Atharva
AU - Stap, David
AU - Pathak, Eshaan
AU - Karamanolakis, Giannis
AU - Lai, Haizhi Gary
AU - Purohit, Ishan
AU - Mondal, Ishani
AU - Anderson, Jacob
AU - Kuznia, Kirby
AU - Doshi, Krima
AU - Patel, Maitreya
AU - Pal, Kuntal Kumar
AU - Moradshahi, Mehrad
AU - Parmar, Mihir
AU - Purohit, Mirali
AU - Varshney, Neeraj
AU - Kaza, Phani Rohitha
AU - Verma, Pulkit
AU - Puri, Ravsehaj Singh
AU - Karia, Rushang
AU - Sampat, Shailaja Keyur
AU - Doshi, Savan
AU - Mishra, Siddhartha
AU - Reddy, Sujan
AU - Patro, Sumanta
AU - Dixit, Tanay
AU - Shen, Xudong
AU - Baral, Chitta
AU - Choi, Yejin
AU - Smith, Noah A.
AU - Hajishirzi, Hannaneh
AU - Khashabi, Daniel
N1 - Funding Information:
We thank the anonymous reviewers, our colleagues from AI2 and UWNLP, especially Matthew Peters for his encouraging conversations that motivated this project. We also thank the student contributors of Arizona State University's CSE 576 “Topics in NLP” course and all other contributors to our data repository. All experiments were run on AI2's Beaker GPU clusters and Google's research TPUs. This work was supported in part by ONR MURI N00014-18-1-2670, ONR N00014-18-1-2826, and DARPA MCS N66001-19-2-4031 grants.
Funding Information:
We thank the anonymous reviewers, our colleagues from AI2 and UWNLP, especially Matthew Peters for his encouraging conversations that motivated this project. We also thank the student contributors of Arizona State University’s CSE 576 “Topics in NLP” course and all other contributors to our data repository. All experiments were run on AI2’s Beaker GPU clusters and Google’s research TPUs. This work was supported in part by ONR MURI N00014-18-1-2670, ONR N00014-18-1-2826, and DARPA MCS N66001-19-2-4031 grants.
Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce SUPER-NATURALINSTRUCTIONS, a benchmark of 1, 616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions-training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones. Furthermore, we build Tk-INSTRUCT, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-INSTRUCT outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models.
AB - How well can NLP models generalize to a variety of unseen tasks when provided with task instructions? To address this question, we first introduce SUPER-NATURALINSTRUCTIONS, a benchmark of 1, 616 diverse NLP tasks and their expert-written instructions. Our collection covers 76 distinct task types, including but not limited to classification, extraction, infilling, sequence tagging, text rewriting, and text composition. This large and diverse collection of tasks enables rigorous benchmarking of cross-task generalization under instructions-training models to follow instructions on a subset of tasks and evaluating them on the remaining unseen ones. Furthermore, we build Tk-INSTRUCT, a transformer model trained to follow a variety of in-context instructions (plain language task definitions or k-shot examples). Our experiments show that Tk-INSTRUCT outperforms existing instruction-following models such as InstructGPT by over 9% on our benchmark despite being an order of magnitude smaller. We further analyze generalization as a function of various scaling parameters, such as the number of observed tasks, the number of instances per task, and model sizes. We hope our dataset and model facilitate future progress towards more general-purpose NLP models.
UR - http://www.scopus.com/inward/record.url?scp=85143257592&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143257592&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85143257592
SP - 5085
EP - 5109
Y2 - 7 December 2022 through 11 December 2022
ER -