Abstract

Imbalanced data widely exist in many high-impact applications. An example is in air traffic control, where among all three types of accident causes, historical accident reports with 'personnel issues' are much more than the other two types ('aircraft issues' and 'environmental issues') combined. Thus, the resulting data set of accident reports is highly imbalanced. On the other hand, this data set can be naturally modeled as a network, with each node representing an accident report, and each edge indicating the similarity of a pair of accident reports. Up until now, most existing work on imbalanced data analysis focused on the classification setting, and very little is devoted to learning the node representations for imbalanced networks. To bridge this gap, in this paper, we first propose Vertex-Diminished Random Walk (VDRW) for imbalanced network analysis. It is significantly different from the existing Vertex Reinforced Random Walk by discouraging the random particle to return to the nodes that have already been visited. This design is particularly suitable for imbalanced networks as the random particle is more likely to visit the nodes from the same class, which is a desired property for learning node representations. Furthermore, based on VDRW, we propose a semi-supervised network representation learning framework named ImVerde for imbalanced networks, where context sampling uses VDRW and the limited label information to create node-context pairs, and balanced-batch sampling adopts a simple under-sampling method to balance these pairs from different classes. Experimental results demonstrate that ImVerde based on VDRW outperforms state-of-the-art algorithms for learning network representations from imbalanced data.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsYang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages871-880
Number of pages10
ISBN (Electronic)9781538650356
DOIs
StatePublished - Jan 22 2019
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: Dec 10 2018Dec 13 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
CountryUnited States
CitySeattle
Period12/10/1812/13/18

Keywords

  • Network representation
  • imbalanced data
  • random walk

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems

Fingerprint Dive into the research topics of 'ImVerde: Vertex-Diminished Random Walk for Learning Imbalanced Network Representation'. Together they form a unique fingerprint.

  • Cite this

    Wu, J., He, J., & Liu, Y. (2019). ImVerde: Vertex-Diminished Random Walk for Learning Imbalanced Network Representation. In Y. Song, B. Liu, K. Lee, N. Abe, C. Pu, M. Qiao, N. Ahmed, D. Kossmann, J. Saltz, J. Tang, J. He, H. Liu, & X. Hu (Eds.), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp. 871-880). [8622603] (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2018.8622603