Using ghost edges for classification in sparsely labeled networks

Brian Gallagher, Hanghang Tong, Tina Eliassi-Rad, Christos Faloutsos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

109 Citations (Scopus)

Abstract

We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages256-264
Number of pages9
DOIs
StatePublished - 2008
Externally publishedYes
Event14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States
Duration: Aug 24 2008Aug 27 2008

Other

Other14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
CountryUnited States
CityLas Vegas, NV
Period8/24/088/27/08

Fingerprint

Labels
Supervised learning
Classifiers
Testing
Experiments

Keywords

  • Collective classification
  • Random walk
  • Semi-supervised learning
  • Statistical relational learning

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Gallagher, B., Tong, H., Eliassi-Rad, T., & Faloutsos, C. (2008). Using ghost edges for classification in sparsely labeled networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 256-264) https://doi.org/10.1145/1401890.1401925

Using ghost edges for classification in sparsely labeled networks. / Gallagher, Brian; Tong, Hanghang; Eliassi-Rad, Tina; Faloutsos, Christos.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 256-264.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gallagher, B, Tong, H, Eliassi-Rad, T & Faloutsos, C 2008, Using ghost edges for classification in sparsely labeled networks. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 256-264, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, Las Vegas, NV, United States, 8/24/08. https://doi.org/10.1145/1401890.1401925
Gallagher B, Tong H, Eliassi-Rad T, Faloutsos C. Using ghost edges for classification in sparsely labeled networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 256-264 https://doi.org/10.1145/1401890.1401925
Gallagher, Brian ; Tong, Hanghang ; Eliassi-Rad, Tina ; Faloutsos, Christos. / Using ghost edges for classification in sparsely labeled networks. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008. pp. 256-264
@inproceedings{dbbe2a4be89c40fa92fd1975b943b224,
title = "Using ghost edges for classification in sparsely labeled networks",
abstract = "We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding {"}ghost edges{"} to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.",
keywords = "Collective classification, Random walk, Semi-supervised learning, Statistical relational learning",
author = "Brian Gallagher and Hanghang Tong and Tina Eliassi-Rad and Christos Faloutsos",
year = "2008",
doi = "10.1145/1401890.1401925",
language = "English (US)",
isbn = "9781605581934",
pages = "256--264",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Using ghost edges for classification in sparsely labeled networks

AU - Gallagher, Brian

AU - Tong, Hanghang

AU - Eliassi-Rad, Tina

AU - Faloutsos, Christos

PY - 2008

Y1 - 2008

N2 - We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.

AB - We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.

KW - Collective classification

KW - Random walk

KW - Semi-supervised learning

KW - Statistical relational learning

UR - http://www.scopus.com/inward/record.url?scp=65449133627&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65449133627&partnerID=8YFLogxK

U2 - 10.1145/1401890.1401925

DO - 10.1145/1401890.1401925

M3 - Conference contribution

AN - SCOPUS:65449133627

SN - 9781605581934

SP - 256

EP - 264

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -