Abstract

Rare category detection(RCD) is an important topicin data mining, focusing on identifying the initial examples fromrare classes in imbalanced data sets. This problem becomes more challenging when the data is presented as time-evolving graphs, as used in synthetic ID detection and insider threat detection. Most existing techniques for RCD are designed for static data sets, thus not suitable for time-evolving RCD applications. To address this challenge, in this paper, we first proposetwo incremental RCD algorithms, SIRD and BIRD. They arebuilt upon existing density-based techniques for RCD, andincrementally update the detection models, which provide 'timeflexible' RCD. Furthermore, based on BIRD, we propose amodified version named BIRD-LI to deal with the cases wherethe exact priors of the minority classes are not available. Wealso identify a critical task in RCD named query distribution. Itaims to allocate the limited budget into multiple time steps, suchthat the initial examples from the rare classes are detected asearly as possible with the minimum labeling cost. The proposedincremental RCD algorithms and various query distributionstrategies are evaluated empirically on both synthetic and real data.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE International Conference on Data Mining, ICDM
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1135-1140
Number of pages6
Volume2016-January
ISBN (Print)9781467395038
DOIs
StatePublished - Jan 5 2016
Event15th IEEE International Conference on Data Mining, ICDM 2015 - Atlantic City, United States
Duration: Nov 14 2015Nov 17 2015

Other

Other15th IEEE International Conference on Data Mining, ICDM 2015
CountryUnited States
CityAtlantic City
Period11/14/1511/17/15

Fingerprint

Labeling
Data mining
Costs

Keywords

  • Incremental Learning
  • Rare Category Detection
  • Time-evolving Graph Mining

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Zhou, D., Wang, K., Cao, N., & He, J. (2016). Rare category detection on time-evolving graphs. In Proceedings - IEEE International Conference on Data Mining, ICDM (Vol. 2016-January, pp. 1135-1140). [7373448] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICDM.2015.120

Rare category detection on time-evolving graphs. / Zhou, Dawei; Wang, Kangyang; Cao, Nan; He, Jingrui.

Proceedings - IEEE International Conference on Data Mining, ICDM. Vol. 2016-January Institute of Electrical and Electronics Engineers Inc., 2016. p. 1135-1140 7373448.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhou, D, Wang, K, Cao, N & He, J 2016, Rare category detection on time-evolving graphs. in Proceedings - IEEE International Conference on Data Mining, ICDM. vol. 2016-January, 7373448, Institute of Electrical and Electronics Engineers Inc., pp. 1135-1140, 15th IEEE International Conference on Data Mining, ICDM 2015, Atlantic City, United States, 11/14/15. https://doi.org/10.1109/ICDM.2015.120
Zhou D, Wang K, Cao N, He J. Rare category detection on time-evolving graphs. In Proceedings - IEEE International Conference on Data Mining, ICDM. Vol. 2016-January. Institute of Electrical and Electronics Engineers Inc. 2016. p. 1135-1140. 7373448 https://doi.org/10.1109/ICDM.2015.120
Zhou, Dawei ; Wang, Kangyang ; Cao, Nan ; He, Jingrui. / Rare category detection on time-evolving graphs. Proceedings - IEEE International Conference on Data Mining, ICDM. Vol. 2016-January Institute of Electrical and Electronics Engineers Inc., 2016. pp. 1135-1140
@inproceedings{296edb08f67b4254bc1b9ffdfcf7edb8,
title = "Rare category detection on time-evolving graphs",
abstract = "Rare category detection(RCD) is an important topicin data mining, focusing on identifying the initial examples fromrare classes in imbalanced data sets. This problem becomes more challenging when the data is presented as time-evolving graphs, as used in synthetic ID detection and insider threat detection. Most existing techniques for RCD are designed for static data sets, thus not suitable for time-evolving RCD applications. To address this challenge, in this paper, we first proposetwo incremental RCD algorithms, SIRD and BIRD. They arebuilt upon existing density-based techniques for RCD, andincrementally update the detection models, which provide 'timeflexible' RCD. Furthermore, based on BIRD, we propose amodified version named BIRD-LI to deal with the cases wherethe exact priors of the minority classes are not available. Wealso identify a critical task in RCD named query distribution. Itaims to allocate the limited budget into multiple time steps, suchthat the initial examples from the rare classes are detected asearly as possible with the minimum labeling cost. The proposedincremental RCD algorithms and various query distributionstrategies are evaluated empirically on both synthetic and real data.",
keywords = "Incremental Learning, Rare Category Detection, Time-evolving Graph Mining",
author = "Dawei Zhou and Kangyang Wang and Nan Cao and Jingrui He",
year = "2016",
month = "1",
day = "5",
doi = "10.1109/ICDM.2015.120",
language = "English (US)",
isbn = "9781467395038",
volume = "2016-January",
pages = "1135--1140",
booktitle = "Proceedings - IEEE International Conference on Data Mining, ICDM",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Rare category detection on time-evolving graphs

AU - Zhou, Dawei

AU - Wang, Kangyang

AU - Cao, Nan

AU - He, Jingrui

PY - 2016/1/5

Y1 - 2016/1/5

N2 - Rare category detection(RCD) is an important topicin data mining, focusing on identifying the initial examples fromrare classes in imbalanced data sets. This problem becomes more challenging when the data is presented as time-evolving graphs, as used in synthetic ID detection and insider threat detection. Most existing techniques for RCD are designed for static data sets, thus not suitable for time-evolving RCD applications. To address this challenge, in this paper, we first proposetwo incremental RCD algorithms, SIRD and BIRD. They arebuilt upon existing density-based techniques for RCD, andincrementally update the detection models, which provide 'timeflexible' RCD. Furthermore, based on BIRD, we propose amodified version named BIRD-LI to deal with the cases wherethe exact priors of the minority classes are not available. Wealso identify a critical task in RCD named query distribution. Itaims to allocate the limited budget into multiple time steps, suchthat the initial examples from the rare classes are detected asearly as possible with the minimum labeling cost. The proposedincremental RCD algorithms and various query distributionstrategies are evaluated empirically on both synthetic and real data.

AB - Rare category detection(RCD) is an important topicin data mining, focusing on identifying the initial examples fromrare classes in imbalanced data sets. This problem becomes more challenging when the data is presented as time-evolving graphs, as used in synthetic ID detection and insider threat detection. Most existing techniques for RCD are designed for static data sets, thus not suitable for time-evolving RCD applications. To address this challenge, in this paper, we first proposetwo incremental RCD algorithms, SIRD and BIRD. They arebuilt upon existing density-based techniques for RCD, andincrementally update the detection models, which provide 'timeflexible' RCD. Furthermore, based on BIRD, we propose amodified version named BIRD-LI to deal with the cases wherethe exact priors of the minority classes are not available. Wealso identify a critical task in RCD named query distribution. Itaims to allocate the limited budget into multiple time steps, suchthat the initial examples from the rare classes are detected asearly as possible with the minimum labeling cost. The proposedincremental RCD algorithms and various query distributionstrategies are evaluated empirically on both synthetic and real data.

KW - Incremental Learning

KW - Rare Category Detection

KW - Time-evolving Graph Mining

UR - http://www.scopus.com/inward/record.url?scp=84963626816&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963626816&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2015.120

DO - 10.1109/ICDM.2015.120

M3 - Conference contribution

AN - SCOPUS:84963626816

SN - 9781467395038

VL - 2016-January

SP - 1135

EP - 1140

BT - Proceedings - IEEE International Conference on Data Mining, ICDM

PB - Institute of Electrical and Electronics Engineers Inc.

ER -