TY - GEN
T1 - Understanding the Role of Mixup in Knowledge Distillation
T2 - 23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023
AU - Choi, Hongjun
AU - Jeon, Eun Som
AU - Shukla, Ankita
AU - Turaga, Pavan
N1 - Funding Information:
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112290073. Approved for public release; distribution is unlimited.
Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that "smoothness"is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.
AB - Mixup is a popular data augmentation technique based on creating new samples by linear interpolation between two given data samples, to improve both the generalization and robustness of the trained model. Knowledge distillation (KD), on the other hand, is widely used for model compression and transfer learning, which involves using a larger network's implicit knowledge to guide the learning of a smaller network. At first glance, these two techniques seem very different, however, we found that "smoothness"is the connecting link between the two and is also a crucial attribute in understanding KD's interplay with mixup. Although many mixup variants and distillation methods have been proposed, much remains to be understood regarding the role of a mixup in knowledge distillation. In this paper, we present a detailed empirical study on various important dimensions of compatibility between mixup and knowledge distillation. We also scrutinize the behavior of the networks trained with a mixup in the light of knowledge distillation through extensive analysis, visualizations, and comprehensive experiments on image classification. Finally, based on our findings, we suggest improved strategies to guide the student network to enhance its effectiveness. Additionally, the findings of this study provide insightful suggestions to researchers and practitioners that commonly use techniques from KD. Our code is available at https://github.com/hchoi71/MIX-KD.
KW - adversarial attack and defense methods
KW - Adversarial learning
KW - Algorithms: Image recognition and understanding (object detection, categorization, segmentation)
UR - http://www.scopus.com/inward/record.url?scp=85149010130&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85149010130&partnerID=8YFLogxK
U2 - 10.1109/WACV56688.2023.00235
DO - 10.1109/WACV56688.2023.00235
M3 - Conference contribution
AN - SCOPUS:85149010130
T3 - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
SP - 2318
EP - 2327
BT - Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 3 January 2023 through 7 January 2023
ER -