TY - GEN
T1 - Graph-based rare category detection
AU - He, Jingrui
AU - Yan, Liu
AU - Lawrence, Richard
PY - 2008
Y1 - 2008
N2 - Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.
AB - Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where probability density changes the most, it relaxes the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of each minority class as input. Besides working with data with structured features, both GRADE and GRADE-LI can also work with graph data, which can not be handled by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms.
UR - http://www.scopus.com/inward/record.url?scp=67049161208&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=67049161208&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2008.122
DO - 10.1109/ICDM.2008.122
M3 - Conference contribution
AN - SCOPUS:67049161208
SN - 9780769535029
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 833
EP - 838
BT - Proceedings - 8th IEEE International Conference on Data Mining, ICDM 2008
T2 - 8th IEEE International Conference on Data Mining, ICDM 2008
Y2 - 15 December 2008 through 19 December 2008
ER -