TY - GEN
T1 - A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels
AU - Ding, Yifan
AU - Wang, Liqiang
AU - Fan, Deliang
AU - Gong, Boqing
N1 - Funding Information:
This work was supported in part by NSF-1741431.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/5/3
Y1 - 2018/5/3
N2 - The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks using such noisy labels. Existing methods tackling this problem either try to identify and correct the wrong labels or reweigh the data terms in the loss function according to the inferred noisy rates. Both strategies inevitably incur errors for some of the data points. In this paper, we contend that it is actually better to ignore the labels of some of the data points than to keep them if the labels are incorrect, especially when the noisy rate is high. After all, the wrong labels could mislead a neural network to a bad local optimum. We suggest a two-stage framework for the learning from noisy labels. In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability. The noisy labels of the other images are ignored. In the second stage, we train a deep neural network in a semi-supervised manner. This framework effectively takes advantage of the whole training set and yet only a portion of its labels that are most likely correct. Experiments on three datasets verify the effectiveness of our approach especially when the noisy rate is high.
AB - The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks using such noisy labels. Existing methods tackling this problem either try to identify and correct the wrong labels or reweigh the data terms in the loss function according to the inferred noisy rates. Both strategies inevitably incur errors for some of the data points. In this paper, we contend that it is actually better to ignore the labels of some of the data points than to keep them if the labels are incorrect, especially when the noisy rate is high. After all, the wrong labels could mislead a neural network to a bad local optimum. We suggest a two-stage framework for the learning from noisy labels. In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability. The noisy labels of the other images are ignored. In the second stage, we train a deep neural network in a semi-supervised manner. This framework effectively takes advantage of the whole training set and yet only a portion of its labels that are most likely correct. Experiments on three datasets verify the effectiveness of our approach especially when the noisy rate is high.
UR - http://www.scopus.com/inward/record.url?scp=85051011000&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85051011000&partnerID=8YFLogxK
U2 - 10.1109/WACV.2018.00138
DO - 10.1109/WACV.2018.00138
M3 - Conference contribution
AN - SCOPUS:85051011000
T3 - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
SP - 1215
EP - 1224
BT - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018
Y2 - 12 March 2018 through 15 March 2018
ER -