TY - GEN
T1 - Towards Improving Selective Prediction Ability of NLP Systems
AU - Varshney, Neeraj
AU - Mishra, Swaroop
AU - Baral, Chitta
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - It’s better to say “I can’t answer” than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. Using these two signals, we first annotate held-out instances and then train a calibrator to predict the likelihood of correctness of the model’s prediction. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings. In (IID, OOD) settings, we show that the representations learned by our calibrator result in an improvement of (15.81%, 5.64%) and (6.19%, 13.9%) over MaxProb –a selective prediction baseline– on NLI and DD tasks respectively.
AB - It’s better to say “I can’t answer” than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. Using these two signals, we first annotate held-out instances and then train a calibrator to predict the likelihood of correctness of the model’s prediction. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings. In (IID, OOD) settings, we show that the representations learned by our calibrator result in an improvement of (15.81%, 5.64%) and (6.19%, 13.9%) over MaxProb –a selective prediction baseline– on NLI and DD tasks respectively.
UR - http://www.scopus.com/inward/record.url?scp=85137593267&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137593267&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85137593267
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 221
EP - 226
BT - ACL 2022 - 7th Workshop on Representation Learning for NLP, RepL4NLP 2022 - Proceedings of the Workshop
PB - Association for Computational Linguistics (ACL)
T2 - 7th Workshop on Representation Learning for NLP, RepL4NLP 2022 at ACL 2022
Y2 - 26 May 2022
ER -