An effective framework for characterizing rare categories

Jingrui He, Hanghang Tong, Jaime Carbonell

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Rare categories become more and more abundant and their characterization has received little attention thus far. Fraudulent banking transactions, network intrusions, and rare diseases are examples of rare classes whose detection and characterization are of high value. However, accurate characterization is challenging due to high-skewness and nonseparability from majority classes, e. g., fraudulent transactions masquerade as legitimate ones. This paper proposes the RACH algorithm by exploring the compactness property of the rare categories. This algorithm is semi-supervised in nature since it uses both labeled and unlabeled data. It is based on an optimization framework which encloses the rare examples by a minimum-radius hyperball. The framework is then converted into a convex optimization problem, which is in turn effectively solved in its dual form by the projected subgradient method. RACH can be naturally kernelized. Experimental results validate the effectiveness of RACH.

Original languageEnglish (US)
Pages (from-to)154-165
Number of pages12
JournalFrontiers of Computer Science in China
Volume6
Issue number2
DOIs
StatePublished - Apr 2012
Externally publishedYes

Fingerprint

Transactions
Subgradient Method
Convex optimization
Banking
Skewness
Convex Optimization
Compactness
Radius
Optimization Problem
Optimization
Experimental Results
Framework
Class
Form

Keywords

  • characterization
  • compactness
  • hyperball
  • minority class
  • optimization
  • rare category
  • subgradient

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

An effective framework for characterizing rare categories. / He, Jingrui; Tong, Hanghang; Carbonell, Jaime.

In: Frontiers of Computer Science in China, Vol. 6, No. 2, 04.2012, p. 154-165.

Research output: Contribution to journalArticle

@article{89b993b0586a4d4abb885b1f8b0746c8,
title = "An effective framework for characterizing rare categories",
abstract = "Rare categories become more and more abundant and their characterization has received little attention thus far. Fraudulent banking transactions, network intrusions, and rare diseases are examples of rare classes whose detection and characterization are of high value. However, accurate characterization is challenging due to high-skewness and nonseparability from majority classes, e. g., fraudulent transactions masquerade as legitimate ones. This paper proposes the RACH algorithm by exploring the compactness property of the rare categories. This algorithm is semi-supervised in nature since it uses both labeled and unlabeled data. It is based on an optimization framework which encloses the rare examples by a minimum-radius hyperball. The framework is then converted into a convex optimization problem, which is in turn effectively solved in its dual form by the projected subgradient method. RACH can be naturally kernelized. Experimental results validate the effectiveness of RACH.",
keywords = "characterization, compactness, hyperball, minority class, optimization, rare category, subgradient",
author = "Jingrui He and Hanghang Tong and Jaime Carbonell",
year = "2012",
month = "4",
doi = "10.1007/s11704-012-2861-9",
language = "English (US)",
volume = "6",
pages = "154--165",
journal = "Frontiers of Computer Science",
issn = "2095-2228",
publisher = "Springer Science + Business Media",
number = "2",

}

TY - JOUR

T1 - An effective framework for characterizing rare categories

AU - He, Jingrui

AU - Tong, Hanghang

AU - Carbonell, Jaime

PY - 2012/4

Y1 - 2012/4

N2 - Rare categories become more and more abundant and their characterization has received little attention thus far. Fraudulent banking transactions, network intrusions, and rare diseases are examples of rare classes whose detection and characterization are of high value. However, accurate characterization is challenging due to high-skewness and nonseparability from majority classes, e. g., fraudulent transactions masquerade as legitimate ones. This paper proposes the RACH algorithm by exploring the compactness property of the rare categories. This algorithm is semi-supervised in nature since it uses both labeled and unlabeled data. It is based on an optimization framework which encloses the rare examples by a minimum-radius hyperball. The framework is then converted into a convex optimization problem, which is in turn effectively solved in its dual form by the projected subgradient method. RACH can be naturally kernelized. Experimental results validate the effectiveness of RACH.

AB - Rare categories become more and more abundant and their characterization has received little attention thus far. Fraudulent banking transactions, network intrusions, and rare diseases are examples of rare classes whose detection and characterization are of high value. However, accurate characterization is challenging due to high-skewness and nonseparability from majority classes, e. g., fraudulent transactions masquerade as legitimate ones. This paper proposes the RACH algorithm by exploring the compactness property of the rare categories. This algorithm is semi-supervised in nature since it uses both labeled and unlabeled data. It is based on an optimization framework which encloses the rare examples by a minimum-radius hyperball. The framework is then converted into a convex optimization problem, which is in turn effectively solved in its dual form by the projected subgradient method. RACH can be naturally kernelized. Experimental results validate the effectiveness of RACH.

KW - characterization

KW - compactness

KW - hyperball

KW - minority class

KW - optimization

KW - rare category

KW - subgradient

UR - http://www.scopus.com/inward/record.url?scp=84859170110&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859170110&partnerID=8YFLogxK

U2 - 10.1007/s11704-012-2861-9

DO - 10.1007/s11704-012-2861-9

M3 - Article

VL - 6

SP - 154

EP - 165

JO - Frontiers of Computer Science

JF - Frontiers of Computer Science

SN - 2095-2228

IS - 2

ER -