TY - GEN
T1 - Supervised link prediction using random walks
AU - Liu, Yuechang
AU - Tong, Hanghang
AU - Xie, Lei
AU - Tang, Yong
PY - 2015
Y1 - 2015
N2 - Network structure has become increasingly popular in bigdata representation over the last few years. As a result, network based analysis techniques are applied to networks containing millions of nodes. Link prediction helps people to uncover the missing or unknown links between nodes in networks, which is an essential task in network analysis. Random walk based methods have shown outstanding performance in such task. However, the primary bottleneck for such methods is adapting to networks with different structure and dynamics, and scaling to the network magnitude. Inspired by Random Walk with Restart (RWR), a promising approach for link prediction, this paper proposes a set of path based features and a supervised learning technique, called Supervised Random Walk with Restart (SRWR) to identify missing links. We show that by using these features, a classifier can successfully order the potential links by their closeness to the query node. A new type of heterogeneous network, called Generalized Bi-relation Netowrk (GBN), is defined in this paper, upon which the novel structural features are introduced. Finally experiments are performed on a disease-chemical-gene interaction network, whose result shows SRWR significantly outperforms standard RWR algorithm in terms of the Area Under ROC Curve (AUC) gained and better than or equal to the best algorithms in the field of gene prioritization.
AB - Network structure has become increasingly popular in bigdata representation over the last few years. As a result, network based analysis techniques are applied to networks containing millions of nodes. Link prediction helps people to uncover the missing or unknown links between nodes in networks, which is an essential task in network analysis. Random walk based methods have shown outstanding performance in such task. However, the primary bottleneck for such methods is adapting to networks with different structure and dynamics, and scaling to the network magnitude. Inspired by Random Walk with Restart (RWR), a promising approach for link prediction, this paper proposes a set of path based features and a supervised learning technique, called Supervised Random Walk with Restart (SRWR) to identify missing links. We show that by using these features, a classifier can successfully order the potential links by their closeness to the query node. A new type of heterogeneous network, called Generalized Bi-relation Netowrk (GBN), is defined in this paper, upon which the novel structural features are introduced. Finally experiments are performed on a disease-chemical-gene interaction network, whose result shows SRWR significantly outperforms standard RWR algorithm in terms of the Area Under ROC Curve (AUC) gained and better than or equal to the best algorithms in the field of gene prioritization.
UR - http://www.scopus.com/inward/record.url?scp=84959312493&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959312493&partnerID=8YFLogxK
U2 - 10.1007/978-981-10-0080-5_10
DO - 10.1007/978-981-10-0080-5_10
M3 - Conference contribution
AN - SCOPUS:84959312493
SN - 9789811000799
VL - 568
T3 - Communications in Computer and Information Science
SP - 107
EP - 118
BT - Communications in Computer and Information Science
PB - Springer Verlag
T2 - 4th National Conference on Social Media Processing, SMP 2015
Y2 - 16 November 2015 through 17 November 2015
ER -