TY - GEN
T1 - Learning complex rare categories with dual heterogeneity
AU - Yang, Pei
AU - He, Jingrui
AU - Pan, Jia Yu
N1 - Funding Information:
Acknowledgment: This work is partially supported by the NSF (No. IIS1017415), the Army Research Laboratory (No. W911NF-09-2-0053), Region II University Transportation Center (No. 49997-33 25), DARPA (No. W911NF-11-C-0200 and W911NF-12-C-0028), and NSFC (No. 61473123).
Publisher Copyright:
Copyright © SIAM.
PY - 2015
Y1 - 2015
N2 - In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.
AB - In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.
UR - http://www.scopus.com/inward/record.url?scp=84961873007&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84961873007&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84961873007
T3 - SIAM International Conference on Data Mining 2015, SDM 2015
SP - 523
EP - 531
BT - SIAM International Conference on Data Mining 2015, SDM 2015
A2 - Ye, Jieping
A2 - Venkatasubramanian, Suresh
PB - Society for Industrial and Applied Mathematics Publications
T2 - SIAM International Conference on Data Mining 2015, SDM 2015
Y2 - 30 April 2015 through 2 May 2015
ER -