Prior-free rare category detection

Jingrui He; Jaime Carbonell

Prior-free rare category detection

Jingrui He, Jaime Carbonell

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Rare category detection is an open challenge in machine learning. It plays the central role in applications such as detecting new financial fraud patterns, detecting new network malware, and scientific discovery. In such cases rare categories are hidden among huge volumes of normal data and observations. In this paper, we propose a new method for rare category detection named SEDER, which requires no prior information about the data set. It implicitly performs semiparametric density estimation using specially designed exponentially families, and then picks the examples for labeling where the neighborhood density changes the most. SEDER can work in the cases where the data is not separable. Its unique feature over all existing methods lies in its prior-free nature, i.e. it does not require any prior information about the data set (e.g. the number of classes, the proportion of the different classes, etc.). Therefore, it is more suitable for real applications. Experimental results on both synthetic and real data sets demonstrate the superiority of SEDER.

Original language	English (US)
Title of host publication	Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133
Pages	154-162
Number of pages	9
State	Published - 2009
Externally published	Yes
Event	9th SIAM International Conference on Data Mining 2009, SDM 2009 - Sparks, NV, United States Duration: Apr 30 2009 → May 2 2009

Publication series

Name	Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics
Volume	1

Other

Other	9th SIAM International Conference on Data Mining 2009, SDM 2009
Country/Territory	United States
City	Sparks, NV
Period	4/30/09 → 5/2/09

ASJC Scopus subject areas

Computational Theory and Mathematics
Software
Applied Mathematics

Cite this

He, J., & Carbonell, J. (2009). Prior-free rare category detection. In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133 (pp. 154-162). (Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics; Vol. 1).

Prior-free rare category detection. / He, Jingrui; Carbonell, Jaime.
Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133. 2009. p. 154-162 (Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics; Vol. 1).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

He, J & Carbonell, J 2009, Prior-free rare category detection. in Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133. Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics, vol. 1, pp. 154-162, 9th SIAM International Conference on Data Mining 2009, SDM 2009, Sparks, NV, United States, 4/30/09.

He, Jingrui ; Carbonell, Jaime. / Prior-free rare category detection. Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133. 2009. pp. 154-162 (Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics).

@inproceedings{d54fa88eea6f44998ea9c31fb9e6f3ac,

title = "Prior-free rare category detection",

abstract = "Rare category detection is an open challenge in machine learning. It plays the central role in applications such as detecting new financial fraud patterns, detecting new network malware, and scientific discovery. In such cases rare categories are hidden among huge volumes of normal data and observations. In this paper, we propose a new method for rare category detection named SEDER, which requires no prior information about the data set. It implicitly performs semiparametric density estimation using specially designed exponentially families, and then picks the examples for labeling where the neighborhood density changes the most. SEDER can work in the cases where the data is not separable. Its unique feature over all existing methods lies in its prior-free nature, i.e. it does not require any prior information about the data set (e.g. the number of classes, the proportion of the different classes, etc.). Therefore, it is more suitable for real applications. Experimental results on both synthetic and real data sets demonstrate the superiority of SEDER.",

author = "Jingrui He and Jaime Carbonell",

year = "2009",

language = "English (US)",

isbn = "9781615671090",

series = "Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics",

pages = "154--162",

booktitle = "Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133",

note = "9th SIAM International Conference on Data Mining 2009, SDM 2009 ; Conference date: 30-04-2009 Through 02-05-2009",

}

TY - GEN

T1 - Prior-free rare category detection

AU - He, Jingrui

AU - Carbonell, Jaime

PY - 2009

Y1 - 2009

N2 - Rare category detection is an open challenge in machine learning. It plays the central role in applications such as detecting new financial fraud patterns, detecting new network malware, and scientific discovery. In such cases rare categories are hidden among huge volumes of normal data and observations. In this paper, we propose a new method for rare category detection named SEDER, which requires no prior information about the data set. It implicitly performs semiparametric density estimation using specially designed exponentially families, and then picks the examples for labeling where the neighborhood density changes the most. SEDER can work in the cases where the data is not separable. Its unique feature over all existing methods lies in its prior-free nature, i.e. it does not require any prior information about the data set (e.g. the number of classes, the proportion of the different classes, etc.). Therefore, it is more suitable for real applications. Experimental results on both synthetic and real data sets demonstrate the superiority of SEDER.

AB - Rare category detection is an open challenge in machine learning. It plays the central role in applications such as detecting new financial fraud patterns, detecting new network malware, and scientific discovery. In such cases rare categories are hidden among huge volumes of normal data and observations. In this paper, we propose a new method for rare category detection named SEDER, which requires no prior information about the data set. It implicitly performs semiparametric density estimation using specially designed exponentially families, and then picks the examples for labeling where the neighborhood density changes the most. SEDER can work in the cases where the data is not separable. Its unique feature over all existing methods lies in its prior-free nature, i.e. it does not require any prior information about the data set (e.g. the number of classes, the proportion of the different classes, etc.). Therefore, it is more suitable for real applications. Experimental results on both synthetic and real data sets demonstrate the superiority of SEDER.

UR - http://www.scopus.com/inward/record.url?scp=72849151989&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72849151989&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:72849151989

SN - 9781615671090

T3 - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics

SP - 154

EP - 162

BT - Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics 133

T2 - 9th SIAM International Conference on Data Mining 2009, SDM 2009

Y2 - 30 April 2009 through 2 May 2009

ER -

Prior-free rare category detection

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this