Abstract

Similarity search is a fundamental problem in network analysis and can be applied in many applications, such as collaborator recommendation in coauthor networks, friend recommendation in social networks, and relation prediction in medical information networks. In this article, we propose a sampling-based method using random paths to estimate the similarities based on both common neighbors and structural contexts efficiently in very large homogeneous or heterogeneous information networks. We give a theoretical guarantee that the sampling size depends on the error-bound ϵ, the confidence level (1 - δ), and the path length T of each random walk. We perform an extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return top-k similar vertices for any vertex in a network 300× faster than the state-of-the-art methods.We develop a prototype system of recommending similar authors to demonstrate the effectiveness of our method.

Original languageEnglish (US)
Article number13
JournalACM Transactions on Information Systems
Volume36
Issue number2
DOIs
StatePublished - Aug 1 2017

Fingerprint

Sampling
Electric network analysis
Similarity search
Top-k
Information networks
Confidence
Prototype
Microblogging
Prediction
Social networks
Network analysis
Social relations
Empirical study
Random walk
Error bounds
Guarantee

Keywords

  • Heterogeneous information network
  • Random path
  • Similarity search
  • Social network
  • Vertex similarity

ASJC Scopus subject areas

  • Information Systems
  • Business, Management and Accounting(all)
  • Computer Science Applications

Cite this

Fast and flexible top-k similarity search on large networks. / Zhang, Jing; Tang, Jie; Ma, Cong; Tong, Hanghang; Jing, Yu; Li, Juanzi; Luyten, Walter; Moens, Marie Francine.

In: ACM Transactions on Information Systems, Vol. 36, No. 2, 13, 01.08.2017.

Research output: Contribution to journalArticle

Zhang, J, Tang, J, Ma, C, Tong, H, Jing, Y, Li, J, Luyten, W & Moens, MF 2017, 'Fast and flexible top-k similarity search on large networks', ACM Transactions on Information Systems, vol. 36, no. 2, 13. https://doi.org/10.1145/3086695
Zhang, Jing ; Tang, Jie ; Ma, Cong ; Tong, Hanghang ; Jing, Yu ; Li, Juanzi ; Luyten, Walter ; Moens, Marie Francine. / Fast and flexible top-k similarity search on large networks. In: ACM Transactions on Information Systems. 2017 ; Vol. 36, No. 2.
@article{82762c4b34e54eaabd99f66d969195aa,
title = "Fast and flexible top-k similarity search on large networks",
abstract = "Similarity search is a fundamental problem in network analysis and can be applied in many applications, such as collaborator recommendation in coauthor networks, friend recommendation in social networks, and relation prediction in medical information networks. In this article, we propose a sampling-based method using random paths to estimate the similarities based on both common neighbors and structural contexts efficiently in very large homogeneous or heterogeneous information networks. We give a theoretical guarantee that the sampling size depends on the error-bound ϵ, the confidence level (1 - δ), and the path length T of each random walk. We perform an extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return top-k similar vertices for any vertex in a network 300× faster than the state-of-the-art methods.We develop a prototype system of recommending similar authors to demonstrate the effectiveness of our method.",
keywords = "Heterogeneous information network, Random path, Similarity search, Social network, Vertex similarity",
author = "Jing Zhang and Jie Tang and Cong Ma and Hanghang Tong and Yu Jing and Juanzi Li and Walter Luyten and Moens, {Marie Francine}",
year = "2017",
month = "8",
day = "1",
doi = "10.1145/3086695",
language = "English (US)",
volume = "36",
journal = "ACM Transactions on Information Systems",
issn = "1046-8188",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - Fast and flexible top-k similarity search on large networks

AU - Zhang, Jing

AU - Tang, Jie

AU - Ma, Cong

AU - Tong, Hanghang

AU - Jing, Yu

AU - Li, Juanzi

AU - Luyten, Walter

AU - Moens, Marie Francine

PY - 2017/8/1

Y1 - 2017/8/1

N2 - Similarity search is a fundamental problem in network analysis and can be applied in many applications, such as collaborator recommendation in coauthor networks, friend recommendation in social networks, and relation prediction in medical information networks. In this article, we propose a sampling-based method using random paths to estimate the similarities based on both common neighbors and structural contexts efficiently in very large homogeneous or heterogeneous information networks. We give a theoretical guarantee that the sampling size depends on the error-bound ϵ, the confidence level (1 - δ), and the path length T of each random walk. We perform an extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return top-k similar vertices for any vertex in a network 300× faster than the state-of-the-art methods.We develop a prototype system of recommending similar authors to demonstrate the effectiveness of our method.

AB - Similarity search is a fundamental problem in network analysis and can be applied in many applications, such as collaborator recommendation in coauthor networks, friend recommendation in social networks, and relation prediction in medical information networks. In this article, we propose a sampling-based method using random paths to estimate the similarities based on both common neighbors and structural contexts efficiently in very large homogeneous or heterogeneous information networks. We give a theoretical guarantee that the sampling size depends on the error-bound ϵ, the confidence level (1 - δ), and the path length T of each random walk. We perform an extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return top-k similar vertices for any vertex in a network 300× faster than the state-of-the-art methods.We develop a prototype system of recommending similar authors to demonstrate the effectiveness of our method.

KW - Heterogeneous information network

KW - Random path

KW - Similarity search

KW - Social network

KW - Vertex similarity

UR - http://www.scopus.com/inward/record.url?scp=85028562955&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028562955&partnerID=8YFLogxK

U2 - 10.1145/3086695

DO - 10.1145/3086695

M3 - Article

AN - SCOPUS:85028562955

VL - 36

JO - ACM Transactions on Information Systems

JF - ACM Transactions on Information Systems

SN - 1046-8188

IS - 2

M1 - 13

ER -