Using ghost edges for classification in sparsely labeled networks

Brian Gallagher; Hanghang Tong; Tina Eliassi-Rad; Christos Faloutsos

doi:10.1145/1401890.1401925

Using ghost edges for classification in sparsely labeled networks

Brian Gallagher, Hanghang Tong, Tina Eliassi-Rad, Christos Faloutsos

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

130 Scopus citations

Abstract

We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.

Original language	English (US)
Title of host publication	KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining
Pages	256-264
Number of pages	9
DOIs	https://doi.org/10.1145/1401890.1401925
State	Published - 2008
Externally published	Yes
Event	14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, NV, United States Duration: Aug 24 2008 → Aug 27 2008

Publication series

Name	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other	14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008
Country/Territory	United States
City	Las Vegas, NV
Period	8/24/08 → 8/27/08

Keywords

Collective classification
Random walk
Semi-supervised learning
Statistical relational learning

ASJC Scopus subject areas

Software
Information Systems

Access to Document

10.1145/1401890.1401925

Cite this

Gallagher, B., Tong, H., Eliassi-Rad, T., & Faloutsos, C. (2008). Using ghost edges for classification in sparsely labeled networks. In KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining (pp. 256-264). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/1401890.1401925

Using ghost edges for classification in sparsely labeled networks. / Gallagher, Brian; Tong, Hanghang; Eliassi-Rad, Tina et al.
KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. 2008. p. 256-264 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Gallagher, B, Tong, H, Eliassi-Rad, T & Faloutsos, C 2008, Using ghost edges for classification in sparsely labeled networks. in KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 256-264, 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, Las Vegas, NV, United States, 8/24/08. https://doi.org/10.1145/1401890.1401925

@inproceedings{dbbe2a4be89c40fa92fd1975b943b224,

title = "Using ghost edges for classification in sparsely labeled networks",

abstract = "We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding {"}ghost edges{"} to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.",

keywords = "Collective classification, Random walk, Semi-supervised learning, Statistical relational learning",

author = "Brian Gallagher and Hanghang Tong and Tina Eliassi-Rad and Christos Faloutsos",

year = "2008",

doi = "10.1145/1401890.1401925",

language = "English (US)",

isbn = "9781605581934",

series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

pages = "256--264",

booktitle = "KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining",

note = "14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008 ; Conference date: 24-08-2008 Through 27-08-2008",

}

TY - GEN

T1 - Using ghost edges for classification in sparsely labeled networks

AU - Gallagher, Brian

AU - Tong, Hanghang

AU - Eliassi-Rad, Tina

AU - Faloutsos, Christos

PY - 2008

Y1 - 2008

N2 - We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.

AB - We address the problem of classification in partially labeled networks (a.k.a. within-network classification) where observed class labels are sparse. Techniques for statistical relational learning have been shown to perform well on network classification tasks by exploiting dependencies between class labels of neighboring nodes. However, relational classifiers can fail when unlabeled nodes have too few labeled neighbors to support learning (during training phase) and/or inference (during testing phase). This situation arises in real-world problems when observed labels are sparse. In this paper, we propose a novel approach to within-network classification that combines aspects of statistical relational learning and semi-supervised learning to improve classification performance in sparse networks. Our approach works by adding "ghost edges" to a network, which enable the flow of information from labeled to unlabeled nodes. Through experiments on real-world data sets, we demonstrate that our approach performs well across a range of conditions where existing approaches, such as collective classification and semi-supervised learning, fail. On all tasks, our approach improves area under the ROC curve (AUC) by up to 15 points over existing approaches. Furthermore, we demonstrate that our approach runs in time proportional to L · E, where L is the number of labeled nodes and E is the number of edges.

KW - Collective classification

KW - Random walk

KW - Semi-supervised learning

KW - Statistical relational learning

UR - http://www.scopus.com/inward/record.url?scp=65449133627&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=65449133627&partnerID=8YFLogxK

U2 - 10.1145/1401890.1401925

DO - 10.1145/1401890.1401925

M3 - Conference contribution

AN - SCOPUS:65449133627

SN - 9781605581934

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 256

EP - 264

BT - KDD 2008 - Proceedings of the 14th ACMKDD International Conference on Knowledge Discovery and Data Mining

T2 - 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008

Y2 - 24 August 2008 through 27 August 2008

ER -

Using ghost edges for classification in sparsely labeled networks

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this