Learning complex rare categories with dual heterogeneity

Pei Yang, Jingrui He, Jia Yu Pan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2015, SDM 2015
PublisherSociety for Industrial and Applied Mathematics Publications
Pages523-531
Number of pages9
ISBN (Print)9781510811522
StatePublished - 2015
EventSIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada
Duration: Apr 30 2015May 2 2015

Other

OtherSIAM International Conference on Data Mining 2015, SDM 2015
CountryCanada
CityVancouver
Period4/30/155/2/15

Fingerprint

Time and motion study
Semiconductor materials
Defects
Big data

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Yang, P., He, J., & Pan, J. Y. (2015). Learning complex rare categories with dual heterogeneity. In SIAM International Conference on Data Mining 2015, SDM 2015 (pp. 523-531). Society for Industrial and Applied Mathematics Publications.

Learning complex rare categories with dual heterogeneity. / Yang, Pei; He, Jingrui; Pan, Jia Yu.

SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications, 2015. p. 523-531.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yang, P, He, J & Pan, JY 2015, Learning complex rare categories with dual heterogeneity. in SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications, pp. 523-531, SIAM International Conference on Data Mining 2015, SDM 2015, Vancouver, Canada, 4/30/15.
Yang P, He J, Pan JY. Learning complex rare categories with dual heterogeneity. In SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications. 2015. p. 523-531
Yang, Pei ; He, Jingrui ; Pan, Jia Yu. / Learning complex rare categories with dual heterogeneity. SIAM International Conference on Data Mining 2015, SDM 2015. Society for Industrial and Applied Mathematics Publications, 2015. pp. 523-531
@inproceedings{61f25c08c3b14010bcfd5e224910899f,
title = "Learning complex rare categories with dual heterogeneity",
abstract = "In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.",
author = "Pei Yang and Jingrui He and Pan, {Jia Yu}",
year = "2015",
language = "English (US)",
isbn = "9781510811522",
pages = "523--531",
booktitle = "SIAM International Conference on Data Mining 2015, SDM 2015",
publisher = "Society for Industrial and Applied Mathematics Publications",

}

TY - GEN

T1 - Learning complex rare categories with dual heterogeneity

AU - Yang, Pei

AU - He, Jingrui

AU - Pan, Jia Yu

PY - 2015

Y1 - 2015

N2 - In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.

AB - In the era of big data, it is often the case that the self-similar rare categories in a large data set are of great importance, such as the malicious insiders in big organizations, and the IC devices with defects in semiconductor manufacturing. Furthermore, such rare categories often exhibit multiple types of heterogeneity, such as the task heterogeneity, which originates from data collected in multiple domains, and the view heterogeneity, which originates from multiple information sources. Existing methods for learning rare categories mainly focus on the homogeneous settings, i.e., a single task and a single view. In this paper, for the first time, we study complex rare categories with both task and view heterogeneity, and propose a novel optimization framework named M2LID. It introduces a boundary characterization metric to capture the sharp changes in density near the boundary of the rare categories in the feature space, and constructs a graph-based model to leverage both task and view heterogeneity. Furthermore, M2LID integrates them in a way of mutual benefit. We also present an effective algorithm to solve this framework, analyze its performance from various aspects, and demonstrate its effectiveness on both synthetic and real datasets.

UR - http://www.scopus.com/inward/record.url?scp=84961873007&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961873007&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781510811522

SP - 523

EP - 531

BT - SIAM International Conference on Data Mining 2015, SDM 2015

PB - Society for Industrial and Applied Mathematics Publications

ER -