Representation learning for imbalanced cross-domain classification

Lu Cheng, Ruocheng Guo, K. Selçuk Candan, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Deep architectures are trained on massive amounts of labeled data to guarantee the performance of classification. In the absence of labeled data, domain adaptation often provides an attractive option given that labeled data of a similar nature but from a different domain is available. Previous work has chiefly focused on learning domain invariant representations but overlooked the issues of label imbalance in a single domain or across domains, which are common in many machine learning applications such as fake news detection. In this paper, we study a new cross-domain classification problem where data in each domain can be imbalanced (data imbalance), i.e., the classes are not evenly distributed, and the ratio of the number of positive over negative samples varies across domains (domain imbalance). This cross-domain problem is challenging as it entails covariate bias in the input feature space and representation bias in the latent space where domain invariant representations are learned. To address the challenge, in this paper, we propose an effective approach that leverages a doubly balancing strategy to simultaneously control these two types of bias and learn domain invariant representations. To this end, the proposed method aims to learn representations that are (i) robust to data and domain imbalance, (ii) discriminative between classes, and (iii) invariant across domains. Extensive evaluations of two important real-world applications corroborate the effectiveness of the proposed framework.

Original languageEnglish (US)
Title of host publicationProceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020
EditorsCarlotta Demeniconi, Nitesh Chawla
PublisherSociety for Industrial and Applied Mathematics Publications
Pages478-486
Number of pages9
ISBN (Electronic)9781611976236
DOIs
StatePublished - 2020
Externally publishedYes
Event2020 SIAM International Conference on Data Mining, SDM 2020 - Cincinnati, United States
Duration: May 7 2020May 9 2020

Publication series

NameProceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020

Conference

Conference2020 SIAM International Conference on Data Mining, SDM 2020
CountryUnited States
CityCincinnati
Period5/7/205/9/20

Keywords

  • Data Imbalance
  • Domain Imbalance
  • Representation Learning
  • Unsupervised Domain Adaptation

ASJC Scopus subject areas

  • Computer Science Applications
  • Software

Fingerprint Dive into the research topics of 'Representation learning for imbalanced cross-domain classification'. Together they form a unique fingerprint.

Cite this