Abstract

In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization? To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.

Original languageEnglish (US)
Title of host publicationKDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2807-2816
Number of pages10
ISBN (Print)9781450355520
DOIs
StatePublished - Jul 19 2018
Event24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018 - London, United Kingdom
Duration: Aug 19 2018Aug 23 2018

Other

Other24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018
CountryUnited Kingdom
CityLondon
Period8/19/188/23/18

Fingerprint

Labels
Intrusion detection
Computer networks
Fault detection
Curricula
Availability
Big data

Keywords

  • Network Embedding
  • Rare Category Analysis
  • Self-Paced Learning

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zhou, D., He, J., Yang, H., & Fan, W. (2018). SPARC: Self-paced network representation for few-shot rare category characterization. In KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2807-2816). Association for Computing Machinery. https://doi.org/10.1145/3219819.3219952

SPARC : Self-paced network representation for few-shot rare category characterization. / Zhou, Dawei; He, Jingrui; Yang, Hongxia; Fan, Wei.

KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2018. p. 2807-2816.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhou, D, He, J, Yang, H & Fan, W 2018, SPARC: Self-paced network representation for few-shot rare category characterization. in KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, pp. 2807-2816, 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018, London, United Kingdom, 8/19/18. https://doi.org/10.1145/3219819.3219952
Zhou D, He J, Yang H, Fan W. SPARC: Self-paced network representation for few-shot rare category characterization. In KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2018. p. 2807-2816 https://doi.org/10.1145/3219819.3219952
Zhou, Dawei ; He, Jingrui ; Yang, Hongxia ; Fan, Wei. / SPARC : Self-paced network representation for few-shot rare category characterization. KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2018. pp. 2807-2816
@inproceedings{ef4cbc5fe2b94826860bc20f92f28084,
title = "SPARC: Self-paced network representation for few-shot rare category characterization",
abstract = "In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization? To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.",
keywords = "Network Embedding, Rare Category Analysis, Self-Paced Learning",
author = "Dawei Zhou and Jingrui He and Hongxia Yang and Wei Fan",
year = "2018",
month = "7",
day = "19",
doi = "10.1145/3219819.3219952",
language = "English (US)",
isbn = "9781450355520",
pages = "2807--2816",
booktitle = "KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - SPARC

T2 - Self-paced network representation for few-shot rare category characterization

AU - Zhou, Dawei

AU - He, Jingrui

AU - Yang, Hongxia

AU - Fan, Wei

PY - 2018/7/19

Y1 - 2018/7/19

N2 - In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization? To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.

AB - In the era of big data, it is often the rare categories that are of great interest in many high-impact applications, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from network intrusion detection in computer networks to fault detection in manufacturing. As a result, rare category characterization becomes a fundamental learning task, which aims to accurately characterize the rare categories given limited label information. The unique challenge of rare category characterization, i.e., the non-separability nature of the rare categories from the majority classes, together with the availability of the multi-modal representation of the examples, poses a new research question: how can we learn a salient rare category oriented embedding representation such that the rare examples are well separated from the majority class examples in the embedding space, which facilitates the follow-up rare category characterization? To address this question, inspired by the family of curriculum learning that simulates the cognitive mechanism of human beings, we propose a self-paced framework named SPARC that gradually learns the rare category oriented network representation and the characterization model in a mutually beneficial way by shifting from the 'easy' concept to the target 'difficult' one, in order to facilitate more reliable label propagation to the large number of unlabeled examples. The experimental results on various real data demonstrate that our proposed SPARC algorithm: (1) shows a significant improvement over state-of-the-art graph embedding methods on representing the rare categories that are non-separable from the majority classes; (2) outperforms the existing methods on rare category characterization tasks.

KW - Network Embedding

KW - Rare Category Analysis

KW - Self-Paced Learning

UR - http://www.scopus.com/inward/record.url?scp=85051496075&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051496075&partnerID=8YFLogxK

U2 - 10.1145/3219819.3219952

DO - 10.1145/3219819.3219952

M3 - Conference contribution

SN - 9781450355520

SP - 2807

EP - 2816

BT - KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -