On consistency of graph-based semi-supervised learning

Chengan Du, Yunpeng Zhao, Feng Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Graph-based semi-supervised learning is one of the most popular methods in machine learning. Some of its theoretical properties such as bounds for the generalization error and the convergence of the graph Laplacian regularizer have been studied in computer science and statistics literature. However, a fundamental statistical property - consistency - that is, the prediction by the algorithm can identify the underlying truth with unlimited data. This is not to be confused with the existence of solutions in an equation system, which is a term used in algebra. - has not been proved. In this article, we study the consistency problem under a non-parametric framework. We obtain the following two results: 1) We prove that graph-based semi-supervised learning on the test data is consistent in the case that the estimated scores are enforced to be equal to the observed responses for the labeled data (the hard criterion). The sample size of unlabeled data are allowed to grow at a slower rate than the size of the labeled data in this result. 2) We give a counterexample demonstrating that the estimator can be inconsistent for the case when the estimated scores are not required to be equal to the observed responses (the soft criterion), where a tuning parameter is used to balance the loss function and the graph Laplacian regularizer. These somewhat surprising theoretical findings are supported by numerical studies on both synthetic and real datasets. Moreover, numerical studies show that the hard criterion constantly outperforms the soft criterion even when the sample size of unlabeled data is smaller than the size of labeled data. This suggests that practitioners can safely choose the hard criterion without the burden of selecting the tuning parameter in the soft criterion.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages483-491
Number of pages9
ISBN (Electronic)9781728125190
DOIs
StatePublished - Jul 2019
Event39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019 - Richardson, United States
Duration: Jul 7 2019Jul 9 2019

Publication series

NameProceedings - International Conference on Distributed Computing Systems
Volume2019-July

Conference

Conference39th IEEE International Conference on Distributed Computing Systems, ICDCS 2019
Country/TerritoryUnited States
CityRichardson
Period7/7/197/9/19

Keywords

  • Consistency
  • Graph Laplacian
  • Semi-supervised learning

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'On consistency of graph-based semi-supervised learning'. Together they form a unique fingerprint.

Cite this