Graph-based rare category detection

Jingrui He; Liu Yan; Richard Lawrence

doi:10.1109/ICDM.2008.122

Graph-based rare category detection

Jingrui He, Liu Yan, Richard Lawrence

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

42 Scopus citations

Abstract

Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.

Original language	English (US)
Title of host publication	Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008
Pages	833-838
Number of pages	6
DOIs	https://doi.org/10.1109/ICDM.2008.122
State	Published - 2008
Externally published	Yes
Event	8th IEEE International Conference on Data Mining, ICDM 2008 - Pisa, Italy Duration: Dec 15 2008 → Dec 19 2008

Publication series

Name	Proceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)	1550-4786

Other

Other	8th IEEE International Conference on Data Mining, ICDM 2008
Country/Territory	Italy
City	Pisa
Period	12/15/08 → 12/19/08

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/ICDM.2008.122

Cite this

@inproceedings{a25e95c2ddfc460ebb8159076a987552,

title = "Graph-based rare category detection",

abstract = "Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.",

author = "Jingrui He and Liu Yan and Richard Lawrence",

year = "2008",

doi = "10.1109/ICDM.2008.122",

language = "English (US)",

isbn = "9780769535029",

series = "Proceedings - IEEE International Conference on Data Mining, ICDM",

pages = "833--838",

booktitle = "Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008",

note = "8th IEEE International Conference on Data Mining, ICDM 2008 ; Conference date: 15-12-2008 Through 19-12-2008",

}

TY - GEN

T1 - Graph-based rare category detection

AU - He, Jingrui

AU - Yan, Liu

AU - Lawrence, Richard

PY - 2008

Y1 - 2008

N2 - Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.

AB - Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.

UR - http://www.scopus.com/inward/record.url?scp=67049161208&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67049161208&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2008.122

DO - 10.1109/ICDM.2008.122

M3 - Conference contribution

AN - SCOPUS:67049161208

SN - 9780769535029

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 833

EP - 838

BT - Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008

T2 - 8th IEEE International Conference on Data Mining, ICDM 2008

Y2 - 15 December 2008 through 19 December 2008

ER -

Graph-based rare category detection

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this