TY - GEN

T1 - Nearest-neighbor-based active learning for rare category detection

AU - He, Jingrui

AU - Carbonell, Jaime

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg's Interleave method, the prior best technique in the sparse literature on this topic.

AB - Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg's Interleave method, the prior best technique in the sparse literature on this topic.

UR - http://www.scopus.com/inward/record.url?scp=84858766368&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858766368&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84858766368

SN - 160560352X

SN - 9781605603520

T3 - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

BT - Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

T2 - 21st Annual Conference on Neural Information Processing Systems, NIPS 2007

Y2 - 3 December 2007 through 6 December 2007

ER -