Learning complex rare categories with dual heterogeneity

Pei Yang; Jingrui He; Jia Yu Pan

Learning complex rare categories with dual heterogeneity

Pei Yang, Jingrui He, Jia Yu Pan

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M²LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M²LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.

Original language	English (US)
Title of host publication	SIAM International Conference on Data Mining 2015, SDM 2015
Editors	Jieping Ye, Suresh Venkatasubramanian
Publisher	Society for Industrial and Applied Mathematics Publications
Pages	523-531
Number of pages	9
ISBN (Electronic)	9781510811522
State	Published - 2015
Event	SIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada Duration: Apr 30 2015 → May 2 2015

Publication series

Name	SIAM International Conference on Data Mining 2015, SDM 2015

Other

Other	SIAM International Conference on Data Mining 2015, SDM 2015
Country/Territory	Canada
City	Vancouver
Period	4/30/15 → 5/2/15

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Vision and Pattern Recognition
Software

Cite this

Learning complex rare categories with dual heterogeneity. / Yang, Pei; He, Jingrui; Pan, Jia Yu.
SIAM International Conference on Data Mining 2015, SDM 2015. ed. / Jieping Ye; Suresh Venkatasubramanian. Society for Industrial and Applied Mathematics Publications, 2015. p. 523-531 (SIAM International Conference on Data Mining 2015, SDM 2015).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yang, P, He, J & Pan, JY 2015, Learning complex rare categories with dual heterogeneity. in J Ye & S Venkatasubramanian (eds), SIAM International Conference on Data Mining 2015, SDM 2015. SIAM International Conference on Data Mining 2015, SDM 2015, Society for Industrial and Applied Mathematics Publications, pp. 523-531, SIAM International Conference on Data Mining 2015, SDM 2015, Vancouver, Canada, 4/30/15.

@inproceedings{61f25c08c3b14010bcfd5e224910899f,

title = "Learning complex rare categories with dual heterogeneity",

abstract = "In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.",

author = "Pei Yang and Jingrui He and Pan, {Jia Yu}",

note = "Funding Information: Acknowledgment: This work is partially supported by the NSF (No. IIS1017415), the Army Research Laboratory (No. W911NF-09-2-0053), Region II University Transportation Center (No. 49997-33 25), DARPA (No. W911NF-11-C-0200 and W911NF-12-C-0028), and NSFC (No. 61473123). Publisher Copyright: Copyright {\textcopyright} SIAM.; SIAM International Conference on Data Mining 2015, SDM 2015 ; Conference date: 30-04-2015 Through 02-05-2015",

year = "2015",

language = "English (US)",

series = "SIAM International Conference on Data Mining 2015, SDM 2015",

publisher = "Society for Industrial and Applied Mathematics Publications",

pages = "523--531",

editor = "Jieping Ye and Suresh Venkatasubramanian",

booktitle = "SIAM International Conference on Data Mining 2015, SDM 2015",

}

TY - GEN

T1 - Learning complex rare categories with dual heterogeneity

AU - Yang, Pei

AU - He, Jingrui

AU - Pan, Jia Yu

N1 - Funding Information: Acknowledgment: This work is partially supported by the NSF (No. IIS1017415), the Army Research Laboratory (No. W911NF-09-2-0053), Region II University Transportation Center (No. 49997-33 25), DARPA (No. W911NF-11-C-0200 and W911NF-12-C-0028), and NSFC (No. 61473123). Publisher Copyright: Copyright © SIAM.

PY - 2015

Y1 - 2015

N2 - In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.

AB - In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.

UR - http://www.scopus.com/inward/record.url?scp=84961873007&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961873007&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84961873007

T3 - SIAM International Conference on Data Mining 2015, SDM 2015

SP - 523

EP - 531

BT - SIAM International Conference on Data Mining 2015, SDM 2015

A2 - Ye, Jieping

A2 - Venkatasubramanian, Suresh

PB - Society for Industrial and Applied Mathematics Publications

T2 - SIAM International Conference on Data Mining 2015, SDM 2015

Y2 - 30 April 2015 through 2 May 2015

ER -

Learning complex rare categories with dual heterogeneity

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this