Abstract

Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traf-fic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for eficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOS-PLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and eficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and eficiency of our proposed HOS-PLOC algorithm.

Original languageEnglish (US)
Title of host publicationKDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages655-664
Number of pages10
VolumePart F129685
ISBN (Electronic)9781450348874
DOIs
StatePublished - Aug 13 2017
Event23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017 - Halifax, Canada
Duration: Aug 13 2017Aug 17 2017

Other

Other23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
CountryCanada
CityHalifax
Period8/13/178/17/17

Fingerprint

Proteins
Clustering algorithms
Markov processes
Stars
Seed
Costs

Keywords

  • High-order network structure
  • Local clustering algorithm

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Zhou, D., Zhang, S., Yildirim, M. Y., Alcorn, S., Tong, H., Davulcu, H., & He, J. (2017). A local algorithm for structure-preserving graph cut. In KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Vol. Part F129685, pp. 655-664). Association for Computing Machinery. https://doi.org/10.1145/3097983.3098015

A local algorithm for structure-preserving graph cut. / Zhou, Dawei; Zhang, Si; Yildirim, Mehmet Yigit; Alcorn, Scott; Tong, Hanghang; Davulcu, Hasan; He, Jingrui.

KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. Part F129685 Association for Computing Machinery, 2017. p. 655-664.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhou, D, Zhang, S, Yildirim, MY, Alcorn, S, Tong, H, Davulcu, H & He, J 2017, A local algorithm for structure-preserving graph cut. in KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. vol. Part F129685, Association for Computing Machinery, pp. 655-664, 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, Halifax, Canada, 8/13/17. https://doi.org/10.1145/3097983.3098015
Zhou D, Zhang S, Yildirim MY, Alcorn S, Tong H, Davulcu H et al. A local algorithm for structure-preserving graph cut. In KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. Part F129685. Association for Computing Machinery. 2017. p. 655-664 https://doi.org/10.1145/3097983.3098015
Zhou, Dawei ; Zhang, Si ; Yildirim, Mehmet Yigit ; Alcorn, Scott ; Tong, Hanghang ; Davulcu, Hasan ; He, Jingrui. / A local algorithm for structure-preserving graph cut. KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vol. Part F129685 Association for Computing Machinery, 2017. pp. 655-664
@inproceedings{2a49eb53e3e249609f3558592da008f8,
title = "A local algorithm for structure-preserving graph cut",
abstract = "Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traf-fic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for eficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOS-PLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and eficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and eficiency of our proposed HOS-PLOC algorithm.",
keywords = "High-order network structure, Local clustering algorithm",
author = "Dawei Zhou and Si Zhang and Yildirim, {Mehmet Yigit} and Scott Alcorn and Hanghang Tong and Hasan Davulcu and Jingrui He",
year = "2017",
month = "8",
day = "13",
doi = "10.1145/3097983.3098015",
language = "English (US)",
volume = "Part F129685",
pages = "655--664",
booktitle = "KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - A local algorithm for structure-preserving graph cut

AU - Zhou, Dawei

AU - Zhang, Si

AU - Yildirim, Mehmet Yigit

AU - Alcorn, Scott

AU - Tong, Hanghang

AU - Davulcu, Hasan

AU - He, Jingrui

PY - 2017/8/13

Y1 - 2017/8/13

N2 - Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traf-fic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for eficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOS-PLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and eficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and eficiency of our proposed HOS-PLOC algorithm.

AB - Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traf-fic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for eficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOS-PLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and eficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and eficiency of our proposed HOS-PLOC algorithm.

KW - High-order network structure

KW - Local clustering algorithm

UR - http://www.scopus.com/inward/record.url?scp=85029042139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029042139&partnerID=8YFLogxK

U2 - 10.1145/3097983.3098015

DO - 10.1145/3097983.3098015

M3 - Conference contribution

VL - Part F129685

SP - 655

EP - 664

BT - KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -