AURORA: Auditing PageRank on Large Graphs

Jian Kang; Meijia Wang; Nan Cao; Yinglong Xia; Wei Fan; Hanghang Tong

doi:10.1109/BigData.2018.8622563

AURORA: Auditing PageRank on Large Graphs

Jian Kang, Meijia Wang, Nan Cao, Yinglong Xia, Wei Fan, Hanghang Tong

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

10 Scopus citations

Abstract

Ranking on large-scale graphs plays a fundamental role in many high-impact application domains, ranging from information retrieval, recommender systems, sports team management, biology to neuroscience and many more. PageRank, together with many of its random walk based variants, has become one of the most well-known and widely used algorithms, due to its mathematical elegance and the superior performance across a variety of application domains. Important as it might be, state-of-the-art lacks an intuitive way to explain the ranking results by PageRank (or its variants), e.g., why it thinks the returned top-k webpages are the most important ones in the entire graph; why it gives a higher rank to actor John than actor Smith in terms of their relevance w.r.t. a particular movie?In order to answer these questions, this paper proposes a paradigm shift for PageRank, from identifying which nodes are most important to understanding why the ranking algorithm gives a particular ranking result. We formally define the PageRank auditing problem, whose central idea is to identify a set of key graph elements (e.g., edges, nodes, subgraphs) with the highest influence on the ranking results. We formulate it as an opti-mization problem and propose a family of effective and scalable algorithms (Aurora) to solve it. Our algorithms measure the influence of graph elements and incrementally select influential elements w.r.t. their gradients over the ranking results. We perform extensive empirical evaluations on real-world datasets, which demonstrate that the proposed methods (Aurora) provide intuitive explanations with a linear scalability.

Original language	English (US)
Title of host publication	Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
Editors	Yang Song, Bing Liu, Kisung Lee, Naoki Abe, Calton Pu, Mu Qiao, Nesreen Ahmed, Donald Kossmann, Jeffrey Saltz, Jiliang Tang, Jingrui He, Huan Liu, Xiaohua Hu
Publisher	Institute of Electrical and Electronics Engineers Inc.
Pages	713-722
Number of pages	10
ISBN (Electronic)	9781538650356
DOIs	https://doi.org/10.1109/BigData.2018.8622563
State	Published - Jan 22 2019
Event	2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States Duration: Dec 10 2018 → Dec 13 2018

Publication series

Name	Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference	2018 IEEE International Conference on Big Data, Big Data 2018
Country/Territory	United States
City	Seattle
Period	12/10/18 → 12/13/18

Keywords

Graph mining
PageRank
explainability

ASJC Scopus subject areas

Computer Science Applications
Information Systems

Access to Document

10.1109/BigData.2018.8622563

Cite this

Kang, J., Wang, M., Cao, N., Xia, Y., Fan, W., & Tong, H. (2019). AURORA: Auditing PageRank on Large Graphs. In Y. Song, B. Liu, K. Lee, N. Abe, C. Pu, M. Qiao, N. Ahmed, D. Kossmann, J. Saltz, J. Tang, J. He, H. Liu, & X. Hu (Eds.), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018 (pp. 713-722). Article 8622563 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2018.8622563

AURORA: Auditing PageRank on Large Graphs. / Kang, Jian; Wang, Meijia; Cao, Nan et al.
Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. ed. / Yang Song; Bing Liu; Kisung Lee; Naoki Abe; Calton Pu; Mu Qiao; Nesreen Ahmed; Donald Kossmann; Jeffrey Saltz; Jiliang Tang; Jingrui He; Huan Liu; Xiaohua Hu. Institute of Electrical and Electronics Engineers Inc., 2019. p. 713-722 8622563 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Kang, J, Wang, M, Cao, N, Xia, Y, Fan, W & Tong, H 2019, AURORA: Auditing PageRank on Large Graphs. in Y Song, B Liu, K Lee, N Abe, C Pu, M Qiao, N Ahmed, D Kossmann, J Saltz, J Tang, J He, H Liu & X Hu (eds), Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018., 8622563, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018, Institute of Electrical and Electronics Engineers Inc., pp. 713-722, 2018 IEEE International Conference on Big Data, Big Data 2018, Seattle, United States, 12/10/18. https://doi.org/10.1109/BigData.2018.8622563

Kang J, Wang M, Cao N, Xia Y, Fan W, Tong H. AURORA: Auditing PageRank on Large Graphs. In Song Y, Liu B, Lee K, Abe N, Pu C, Qiao M, Ahmed N, Kossmann D, Saltz J, Tang J, He J, Liu H, Hu X, editors, Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. Institute of Electrical and Electronics Engineers Inc. 2019. p. 713-722. 8622563. (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018). doi: 10.1109/BigData.2018.8622563

Kang, Jian ; Wang, Meijia ; Cao, Nan et al. / AURORA : Auditing PageRank on Large Graphs. Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018. editor / Yang Song ; Bing Liu ; Kisung Lee ; Naoki Abe ; Calton Pu ; Mu Qiao ; Nesreen Ahmed ; Donald Kossmann ; Jeffrey Saltz ; Jiliang Tang ; Jingrui He ; Huan Liu ; Xiaohua Hu. Institute of Electrical and Electronics Engineers Inc., 2019. pp. 713-722 (Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018).

@inproceedings{435ee35a2829406cbfd927057c5089b3,

title = "AURORA: Auditing PageRank on Large Graphs",

abstract = "Ranking on large-scale graphs plays a fundamental role in many high-impact application domains, ranging from information retrieval, recommender systems, sports team management, biology to neuroscience and many more. PageRank, together with many of its random walk based variants, has become one of the most well-known and widely used algorithms, due to its mathematical elegance and the superior performance across a variety of application domains. Important as it might be, state-of-the-art lacks an intuitive way to explain the ranking results by PageRank (or its variants), e.g., why it thinks the returned top-k webpages are the most important ones in the entire graph; why it gives a higher rank to actor John than actor Smith in terms of their relevance w.r.t. a particular movie?In order to answer these questions, this paper proposes a paradigm shift for PageRank, from identifying which nodes are most important to understanding why the ranking algorithm gives a particular ranking result. We formally define the PageRank auditing problem, whose central idea is to identify a set of key graph elements (e.g., edges, nodes, subgraphs) with the highest influence on the ranking results. We formulate it as an opti-mization problem and propose a family of effective and scalable algorithms (Aurora) to solve it. Our algorithms measure the influence of graph elements and incrementally select influential elements w.r.t. their gradients over the ranking results. We perform extensive empirical evaluations on real-world datasets, which demonstrate that the proposed methods (Aurora) provide intuitive explanations with a linear scalability.",

keywords = "Graph mining, PageRank, explainability",

author = "Jian Kang and Meijia Wang and Nan Cao and Yinglong Xia and Wei Fan and Hanghang Tong",

note = "Funding Information: ACKNOWLEDGMENT This work is supported by NSF (IIS-1651203, IIS-1715385 and IIS-1743040), DTRA (HDTRA1-16-0017), ARO (W911NF-16-1-0168), DHS (2017-ST-061-QA0001), NSFC (61602306, Fundamental Research Funds for the Central Universities), and gifts from Huawei and Baidu. Publisher Copyright: {\textcopyright} 2018 IEEE.; 2018 IEEE International Conference on Big Data, Big Data 2018 ; Conference date: 10-12-2018 Through 13-12-2018",

year = "2019",

month = jan,

day = "22",

doi = "10.1109/BigData.2018.8622563",

language = "English (US)",

series = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "713--722",

editor = "Yang Song and Bing Liu and Kisung Lee and Naoki Abe and Calton Pu and Mu Qiao and Nesreen Ahmed and Donald Kossmann and Jeffrey Saltz and Jiliang Tang and Jingrui He and Huan Liu and Xiaohua Hu",

booktitle = "Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018",

}

TY - GEN

T1 - AURORA

T2 - 2018 IEEE International Conference on Big Data, Big Data 2018

AU - Kang, Jian

AU - Wang, Meijia

AU - Cao, Nan

AU - Xia, Yinglong

AU - Fan, Wei

AU - Tong, Hanghang

N1 - Funding Information: ACKNOWLEDGMENT This work is supported by NSF (IIS-1651203, IIS-1715385 and IIS-1743040), DTRA (HDTRA1-16-0017), ARO (W911NF-16-1-0168), DHS (2017-ST-061-QA0001), NSFC (61602306, Fundamental Research Funds for the Central Universities), and gifts from Huawei and Baidu. Publisher Copyright: © 2018 IEEE.

PY - 2019/1/22

Y1 - 2019/1/22

N2 - Ranking on large-scale graphs plays a fundamental role in many high-impact application domains, ranging from information retrieval, recommender systems, sports team management, biology to neuroscience and many more. PageRank, together with many of its random walk based variants, has become one of the most well-known and widely used algorithms, due to its mathematical elegance and the superior performance across a variety of application domains. Important as it might be, state-of-the-art lacks an intuitive way to explain the ranking results by PageRank (or its variants), e.g., why it thinks the returned top-k webpages are the most important ones in the entire graph; why it gives a higher rank to actor John than actor Smith in terms of their relevance w.r.t. a particular movie?In order to answer these questions, this paper proposes a paradigm shift for PageRank, from identifying which nodes are most important to understanding why the ranking algorithm gives a particular ranking result. We formally define the PageRank auditing problem, whose central idea is to identify a set of key graph elements (e.g., edges, nodes, subgraphs) with the highest influence on the ranking results. We formulate it as an opti-mization problem and propose a family of effective and scalable algorithms (Aurora) to solve it. Our algorithms measure the influence of graph elements and incrementally select influential elements w.r.t. their gradients over the ranking results. We perform extensive empirical evaluations on real-world datasets, which demonstrate that the proposed methods (Aurora) provide intuitive explanations with a linear scalability.

AB - Ranking on large-scale graphs plays a fundamental role in many high-impact application domains, ranging from information retrieval, recommender systems, sports team management, biology to neuroscience and many more. PageRank, together with many of its random walk based variants, has become one of the most well-known and widely used algorithms, due to its mathematical elegance and the superior performance across a variety of application domains. Important as it might be, state-of-the-art lacks an intuitive way to explain the ranking results by PageRank (or its variants), e.g., why it thinks the returned top-k webpages are the most important ones in the entire graph; why it gives a higher rank to actor John than actor Smith in terms of their relevance w.r.t. a particular movie?In order to answer these questions, this paper proposes a paradigm shift for PageRank, from identifying which nodes are most important to understanding why the ranking algorithm gives a particular ranking result. We formally define the PageRank auditing problem, whose central idea is to identify a set of key graph elements (e.g., edges, nodes, subgraphs) with the highest influence on the ranking results. We formulate it as an opti-mization problem and propose a family of effective and scalable algorithms (Aurora) to solve it. Our algorithms measure the influence of graph elements and incrementally select influential elements w.r.t. their gradients over the ranking results. We perform extensive empirical evaluations on real-world datasets, which demonstrate that the proposed methods (Aurora) provide intuitive explanations with a linear scalability.

KW - Graph mining

KW - PageRank

KW - explainability

UR - http://www.scopus.com/inward/record.url?scp=85062601606&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85062601606&partnerID=8YFLogxK

U2 - 10.1109/BigData.2018.8622563

DO - 10.1109/BigData.2018.8622563

M3 - Conference contribution

AN - SCOPUS:85062601606

T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

SP - 713

EP - 722

BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

A2 - Song, Yang

A2 - Liu, Bing

A2 - Lee, Kisung

A2 - Abe, Naoki

A2 - Pu, Calton

A2 - Qiao, Mu

A2 - Ahmed, Nesreen

A2 - Kossmann, Donald

A2 - Saltz, Jeffrey

A2 - Tang, Jiliang

A2 - He, Jingrui

A2 - Liu, Huan

A2 - Hu, Xiaohua

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 10 December 2018 through 13 December 2018

ER -

AURORA: Auditing PageRank on Large Graphs

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this