Nearest-neighbor-based active learning for rare category detection

Jingrui He, Jaime Carbonell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

43 Scopus citations

Abstract

Rare category detection is an open challenge for active learning, especially in the de-novo case (no labeled examples), but of significant practical importance for data mining - e.g. detecting new financial transaction fraud patterns, where normal legitimate transactions dominate. This paper develops a new method for detecting an instance of each minority class via an unsupervised local-density-differential sampling strategy. Essentially a variable-scale nearest neighbor process is used to optimize the probability of sampling tightly-grouped minority classes, subject to a local smoothness assumption of the majority class. Results on both synthetic and real data sets are very positive, detecting each minority class with only a fraction of the actively sampled points required by random sampling and by Pelleg's Interleave method, the prior best technique in the sparse literature on this topic.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference
StatePublished - Dec 1 2009
Event21st Annual Conference on Neural Information Processing Systems, NIPS 2007 - Vancouver, BC, Canada
Duration: Dec 3 2007Dec 6 2007

Publication series

NameAdvances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

Other

Other21st Annual Conference on Neural Information Processing Systems, NIPS 2007
Country/TerritoryCanada
CityVancouver, BC
Period12/3/0712/6/07

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Nearest-neighbor-based active learning for rare category detection'. Together they form a unique fingerprint.

Cite this