Fast random walk with restart and its applications

Hanghang Tong; Christos Faloutsos; Jia Yu Pan

doi:10.1109/ICDM.2006.70

Fast random walk with restart and its applications

Hanghang Tong, Christos Faloutsos, Jia Yu Pan

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

888 Scopus citations

Abstract

How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the "connection subgraphs", personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman-Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation.

Original language	English (US)
Title of host publication	Proceedings - Sixth International Conference on Data Mining, ICDM 2006
Pages	613-622
Number of pages	10
DOIs	https://doi.org/10.1109/ICDM.2006.70
State	Published - 2006
Externally published	Yes
Event	6th International Conference on Data Mining, ICDM 2006 - Hong Kong, China Duration: Dec 18 2006 → Dec 22 2006

Publication series

Name	Proceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)	1550-4786

Other

Other	6th International Conference on Data Mining, ICDM 2006
Country/Territory	China
City	Hong Kong
Period	12/18/06 → 12/22/06

ASJC Scopus subject areas

General Engineering

Access to Document

10.1109/ICDM.2006.70

Cite this

Tong, H, Faloutsos, C & Pan, JY 2006, Fast random walk with restart and its applications. in Proceedings - Sixth International Conference on Data Mining, ICDM 2006., 4053087, Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 613-622, 6th International Conference on Data Mining, ICDM 2006, Hong Kong, China, 12/18/06. https://doi.org/10.1109/ICDM.2006.70

@inproceedings{e2a620c137074d66b6c22f12778e61ed,

title = "Fast random walk with restart and its applications",

abstract = "How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the {"}connection subgraphs{"}, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman-Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation.",

author = "Hanghang Tong and Christos Faloutsos and Pan, {Jia Yu}",

year = "2006",

doi = "10.1109/ICDM.2006.70",

language = "English (US)",

isbn = "0769527019",

series = "Proceedings - IEEE International Conference on Data Mining, ICDM",

pages = "613--622",

booktitle = "Proceedings - Sixth International Conference on Data Mining, ICDM 2006",

note = "6th International Conference on Data Mining, ICDM 2006 ; Conference date: 18-12-2006 Through 22-12-2006",

}

TY - GEN

T1 - Fast random walk with restart and its applications

AU - Tong, Hanghang

AU - Faloutsos, Christos

AU - Pan, Jia Yu

PY - 2006

Y1 - 2006

N2 - How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the "connection subgraphs", personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman-Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation.

AB - How closely related are two nodes in a graph? How to compute this score quickly, on huge, disk-resident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the "connection subgraphs", personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic pre-computation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, community-like structure. We exploit the linearity by using low-rank matrix approximation, and the community structure by graph partitioning, followed by the Sherman-Morrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in pre-computation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation.

UR - http://www.scopus.com/inward/record.url?scp=34748827346&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34748827346&partnerID=8YFLogxK

U2 - 10.1109/ICDM.2006.70

DO - 10.1109/ICDM.2006.70

M3 - Conference contribution

AN - SCOPUS:34748827346

SN - 0769527019

SN - 9780769527017

T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

SP - 613

EP - 622

BT - Proceedings - Sixth International Conference on Data Mining, ICDM 2006

T2 - 6th International Conference on Data Mining, ICDM 2006

Y2 - 18 December 2006 through 22 December 2006

ER -

Fast random walk with restart and its applications

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this