Reducing seed noise in personalized PageRank

Shengyu Huang; Xinsheng Li; Kasim Candan; Maria Luisa Sapino

doi:10.1007/s13278-015-0309-6

Reducing seed noise in personalized PageRank

Shengyu Huang, Xinsheng Li, Kasim Candan, Maria Luisa Sapino

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

Original language	English (US)
Article number	6
Pages (from-to)	1-25
Number of pages	25
Journal	Social Network Analysis and Mining
Volume	6
Issue number	1
DOIs	https://doi.org/10.1007/s13278-015-0309-6
State	Published - Dec 1 2016

ASJC Scopus subject areas

Information Systems
Communication
Media Technology
Human-Computer Interaction
Computer Science Applications

Access to Document

10.1007/s13278-015-0309-6

Cite this

@article{9c27786d71614fc5a3f97ac5d9f16ea1,

title = "Reducing seed noise in personalized PageRank",

abstract = "Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.",

author = "Shengyu Huang and Xinsheng Li and Kasim Candan and Sapino, {Maria Luisa}",

note = "Funding Information: This paper is the extended version of Shengyu Huang, Xinsheng Li, K. Sel{\c c}uk Candan, Maria Luisa Sapino. “Can you really trust that seed?”: Reducing the Impact of Seed Noise in Personalized PageRank. International Conference on Advances in Social Network Analysis and Mining (ASONAM). Beijing, China. 2014. This work is supported by NSF Grants 1339835 “E-SDMS: Energy Simulation Data Management System Software” and 1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations”. This work is also supported in part by a CES Grant “Large-scale Data-driven Sensing and Analytics for Dynamic Failure Prediction”. Publisher Copyright: {\textcopyright} 2016, Springer-Verlag Wien.",

year = "2016",

month = dec,

day = "1",

doi = "10.1007/s13278-015-0309-6",

language = "English (US)",

volume = "6",

pages = "1--25",

journal = "Social Network Analysis and Mining",

issn = "1869-5450",

publisher = "Springer Wien",

number = "1",

}

TY - JOUR

T1 - Reducing seed noise in personalized PageRank

AU - Huang, Shengyu

AU - Li, Xinsheng

AU - Candan, Kasim

AU - Sapino, Maria Luisa

N1 - Funding Information: This paper is the extended version of Shengyu Huang, Xinsheng Li, K. Selçuk Candan, Maria Luisa Sapino. “Can you really trust that seed?”: Reducing the Impact of Seed Noise in Personalized PageRank. International Conference on Advances in Social Network Analysis and Mining (ASONAM). Beijing, China. 2014. This work is supported by NSF Grants 1339835 “E-SDMS: Energy Simulation Data Management System Software” and 1318788 “Data Management for Real-Time Data Driven Epidemic Spread Simulations”. This work is also supported in part by a CES Grant “Large-scale Data-driven Sensing and Analytics for Dynamic Failure Prediction”. Publisher Copyright: © 2016, Springer-Verlag Wien.

PY - 2016/12/1

Y1 - 2016/12/1

N2 - Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

AB - Network-based recommendation systems leverage the topology of the underlying graph and the current user context to rank objects in the database. Random walk-based techniques, such as PageRank, encode the structure of the graph in the form of a transition matrix of a stochastic process from which the significances of the nodes in the graph are inferred. Personalized PageRank (PPR) techniques complement this with a seed node set which serves as the personalization context. In this paper, we note (and experimentally show) that PPR algorithms that do not differentiate among the seed nodes may not properly rank nodes in situations where the seed set is incomplete and/or noisy. To tackle this problem, we propose alternative robust personalized PageRank (RPR) strategies, which are insensitive to noise in the set of seed nodes and in which the rankings are not overly biased towards the seed nodes. In particular, we show that novel teleportation-discounting and seed-set maximal PPR techniques help eliminate harmful bias of individual seed nodes and provide effective seed differentiation to lead to more accurate rankings. We also show that the proposed techniques lead to efficient implementations, where existing approximation algorithms and/or parallel implementations for computing the PPR scores can be easily leveraged. Moreover, the proposed formulations are reuse-promoting in the sense that, it is possible to divide the work relative to individual seed nodes and cache the intermediary results obtained during the computation, and especially in systems with large query throughputs, it may be possible to cluster queries based on the partial overlaps between the seed sets and reduce the overall robust PPR computation costs. Experiment results show that the proposed techniques are efficient and highly effective in improving recommendations and eliminating unwanted bias due to imperfections in the seed set.

UR - http://www.scopus.com/inward/record.url?scp=84953237035&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84953237035&partnerID=8YFLogxK

U2 - 10.1007/s13278-015-0309-6

DO - 10.1007/s13278-015-0309-6

M3 - Article

AN - SCOPUS:84953237035

SN - 1869-5450

VL - 6

SP - 1

EP - 25

JO - Social Network Analysis and Mining

JF - Social Network Analysis and Mining

IS - 1

M1 - 6

ER -

Reducing seed noise in personalized PageRank

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this