A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels

Yifan Ding; Liqiang Wang; Deliang Fan; Boqing Gong

doi:10.1109/WACV.2018.00138

A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels

Yifan Ding, Liqiang Wang, Deliang Fan, Boqing Gong

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

70 Scopus citations

Abstract

The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks using such noisy labels. Existing methods tackling this problem either try to identify and correct the wrong labels or reweigh the data terms in the loss function according to the inferred noisy rates. Both strategies inevitably incur errors for some of the data points. In this paper, we contend that it is actually better to ignore the labels of some of the data points than to keep them if the labels are incorrect, especially when the noisy rate is high. After all, the wrong labels could mislead a neural network to a bad local optimum. We suggest a two-stage framework for the learning from noisy labels. In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability. The noisy labels of the other images are ignored. In the second stage, we train a deep neural network in a semi-supervised manner. This framework effectively takes advantage of the whole training set and yet only a portion of its labels that are most likely correct. Experiments on three datasets verify the effectiveness of our approach especially when the noisy rate is high.

Original language	English (US)
Title of host publication	Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	1215-1224
Number of pages	10
ISBN (Electronic)	9781538648865
DOIs	https://doi.org/10.1109/WACV.2018.00138
State	Published - May 3 2018
Externally published	Yes
Event	18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018 - Lake Tahoe, United States Duration: Mar 12 2018 → Mar 15 2018

Publication series

Name	Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018
Volume	2018-January

Other

Other	18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018
Country/Territory	United States
City	Lake Tahoe
Period	3/12/18 → 3/15/18

ASJC Scopus subject areas

Computer Vision and Pattern Recognition
Computer Science Applications

Access to Document

10.1109/WACV.2018.00138

Cite this

Ding, Y., Wang, L., Fan, D., & Gong, B. (2018). A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels. In Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018 (pp. 1215-1224). Article 8354242 (Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018; Vol. 2018-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/WACV.2018.00138

A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels. / Ding, Yifan; Wang, Liqiang; Fan, Deliang et al.
Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 1215-1224 8354242 (Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018; Vol. 2018-January).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ding, Y, Wang, L, Fan, D & Gong, B 2018, A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels. in Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018., 8354242, Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 1215-1224, 18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, United States, 3/12/18. https://doi.org/10.1109/WACV.2018.00138

Ding Y, Wang L, Fan D, Gong B. A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels. In Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 1215-1224. 8354242. (Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018). doi: 10.1109/WACV.2018.00138

Ding, Yifan ; Wang, Liqiang ; Fan, Deliang et al. / A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels. Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 1215-1224 (Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018).

@inproceedings{7f622fb056af4c29ab5f5e59c14325dc,

title = "A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels",

abstract = "The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks using such noisy labels. Existing methods tackling this problem either try to identify and correct the wrong labels or reweigh the data terms in the loss function according to the inferred noisy rates. Both strategies inevitably incur errors for some of the data points. In this paper, we contend that it is actually better to ignore the labels of some of the data points than to keep them if the labels are incorrect, especially when the noisy rate is high. After all, the wrong labels could mislead a neural network to a bad local optimum. We suggest a two-stage framework for the learning from noisy labels. In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability. The noisy labels of the other images are ignored. In the second stage, we train a deep neural network in a semi-supervised manner. This framework effectively takes advantage of the whole training set and yet only a portion of its labels that are most likely correct. Experiments on three datasets verify the effectiveness of our approach especially when the noisy rate is high.",

author = "Yifan Ding and Liqiang Wang and Deliang Fan and Boqing Gong",

note = "Funding Information: This work was supported in part by NSF-1741431. Publisher Copyright: {\textcopyright} 2018 IEEE.; 18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018 ; Conference date: 12-03-2018 Through 15-03-2018",

year = "2018",

month = may,

day = "3",

doi = "10.1109/WACV.2018.00138",

language = "English (US)",

series = "Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "1215--1224",

booktitle = "Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018",

}

TY - GEN

T1 - A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels

AU - Ding, Yifan

AU - Wang, Liqiang

AU - Fan, Deliang

AU - Gong, Boqing

PY - 2018/5/3

Y1 - 2018/5/3

N2 - The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks using such noisy labels. Existing methods tackling this problem either try to identify and correct the wrong labels or reweigh the data terms in the loss function according to the inferred noisy rates. Both strategies inevitably incur errors for some of the data points. In this paper, we contend that it is actually better to ignore the labels of some of the data points than to keep them if the labels are incorrect, especially when the noisy rate is high. After all, the wrong labels could mislead a neural network to a bad local optimum. We suggest a two-stage framework for the learning from noisy labels. In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability. The noisy labels of the other images are ignored. In the second stage, we train a deep neural network in a semi-supervised manner. This framework effectively takes advantage of the whole training set and yet only a portion of its labels that are most likely correct. Experiments on three datasets verify the effectiveness of our approach especially when the noisy rate is high.

AB - The recent success of deep neural networks is powered in part by large-scale well-labeled training data. However, it is a daunting task to laboriously annotate an ImageNet-like dateset. On the contrary, it is fairly convenient, fast, and cheap to collect training images from the Web along with their noisy labels. This signifies the need of alternative approaches to training deep neural networks using such noisy labels. Existing methods tackling this problem either try to identify and correct the wrong labels or reweigh the data terms in the loss function according to the inferred noisy rates. Both strategies inevitably incur errors for some of the data points. In this paper, we contend that it is actually better to ignore the labels of some of the data points than to keep them if the labels are incorrect, especially when the noisy rate is high. After all, the wrong labels could mislead a neural network to a bad local optimum. We suggest a two-stage framework for the learning from noisy labels. In the first stage, we identify a small portion of images from the noisy training set of which the labels are correct with a high probability. The noisy labels of the other images are ignored. In the second stage, we train a deep neural network in a semi-supervised manner. This framework effectively takes advantage of the whole training set and yet only a portion of its labels that are most likely correct. Experiments on three datasets verify the effectiveness of our approach especially when the noisy rate is high.

UR - http://www.scopus.com/inward/record.url?scp=85051011000&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051011000&partnerID=8YFLogxK

U2 - 10.1109/WACV.2018.00138

DO - 10.1109/WACV.2018.00138

M3 - Conference contribution

AN - SCOPUS:85051011000

T3 - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018

SP - 1215

EP - 1224

BT - Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 18th IEEE Winter Conference on Applications of Computer Vision, WACV 2018

Y2 - 12 March 2018 through 15 March 2018

ER -

A Semi-Supervised Two-Stage Approach to Learning from Noisy Labels

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this