TY - GEN
T1 - AURORA
T2 - 2018 IEEE International Conference on Big Data, Big Data 2018
AU - Kang, Jian
AU - Wang, Meijia
AU - Cao, Nan
AU - Xia, Yinglong
AU - Fan, Wei
AU - Tong, Hanghang
N1 - Funding Information:
ACKNOWLEDGMENT This work is supported by NSF (IIS-1651203, IIS-1715385 and IIS-1743040), DTRA (HDTRA1-16-0017), ARO (W911NF-16-1-0168), DHS (2017-ST-061-QA0001), NSFC (61602306, Fundamental Research Funds for the Central Universities), and gifts from Huawei and Baidu.
Publisher Copyright:
© 2018 IEEE.
PY - 2019/1/22
Y1 - 2019/1/22
N2 - Ranking on large-scale graphs plays a fundamental role in many high-impact application domains, ranging from information retrieval, recommender systems, sports team management, biology to neuroscience and many more. PageRank, together with many of its random walk based variants, has become one of the most well-known and widely used algorithms, due to its mathematical elegance and the superior performance across a variety of application domains. Important as it might be, state-of-the-art lacks an intuitive way to explain the ranking results by PageRank (or its variants), e.g., why it thinks the returned top-k webpages are the most important ones in the entire graph; why it gives a higher rank to actor John than actor Smith in terms of their relevance w.r.t. a particular movie?In order to answer these questions, this paper proposes a paradigm shift for PageRank, from identifying which nodes are most important to understanding why the ranking algorithm gives a particular ranking result. We formally define the PageRank auditing problem, whose central idea is to identify a set of key graph elements (e.g., edges, nodes, subgraphs) with the highest influence on the ranking results. We formulate it as an opti-mization problem and propose a family of effective and scalable algorithms (Aurora) to solve it. Our algorithms measure the influence of graph elements and incrementally select influential elements w.r.t. their gradients over the ranking results. We perform extensive empirical evaluations on real-world datasets, which demonstrate that the proposed methods (Aurora) provide intuitive explanations with a linear scalability.
AB - Ranking on large-scale graphs plays a fundamental role in many high-impact application domains, ranging from information retrieval, recommender systems, sports team management, biology to neuroscience and many more. PageRank, together with many of its random walk based variants, has become one of the most well-known and widely used algorithms, due to its mathematical elegance and the superior performance across a variety of application domains. Important as it might be, state-of-the-art lacks an intuitive way to explain the ranking results by PageRank (or its variants), e.g., why it thinks the returned top-k webpages are the most important ones in the entire graph; why it gives a higher rank to actor John than actor Smith in terms of their relevance w.r.t. a particular movie?In order to answer these questions, this paper proposes a paradigm shift for PageRank, from identifying which nodes are most important to understanding why the ranking algorithm gives a particular ranking result. We formally define the PageRank auditing problem, whose central idea is to identify a set of key graph elements (e.g., edges, nodes, subgraphs) with the highest influence on the ranking results. We formulate it as an opti-mization problem and propose a family of effective and scalable algorithms (Aurora) to solve it. Our algorithms measure the influence of graph elements and incrementally select influential elements w.r.t. their gradients over the ranking results. We perform extensive empirical evaluations on real-world datasets, which demonstrate that the proposed methods (Aurora) provide intuitive explanations with a linear scalability.
KW - Graph mining
KW - PageRank
KW - explainability
UR - http://www.scopus.com/inward/record.url?scp=85062601606&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062601606&partnerID=8YFLogxK
U2 - 10.1109/BigData.2018.8622563
DO - 10.1109/BigData.2018.8622563
M3 - Conference contribution
AN - SCOPUS:85062601606
T3 - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
SP - 713
EP - 722
BT - Proceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
A2 - Song, Yang
A2 - Liu, Bing
A2 - Lee, Kisung
A2 - Abe, Naoki
A2 - Pu, Calton
A2 - Qiao, Mu
A2 - Ahmed, Nesreen
A2 - Kossmann, Donald
A2 - Saltz, Jeffrey
A2 - Tang, Jiliang
A2 - He, Jingrui
A2 - Liu, Huan
A2 - Hu, Xiaohua
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 December 2018 through 13 December 2018
ER -