TY - JOUR
T1 - Large-scale supervised similarity learning in networks
AU - Chang, Shiyu
AU - Qi, Guo Jun
AU - Yang, Yingzhen
AU - Aggarwal, Charu C.
AU - Zhou, Jiayu
AU - Wang, Meng
AU - Huang, Thomas S.
N1 - Funding Information:
The work of Shiyu Chang and Thomas S. Huang was funded in part by the National Science Foundation under Grant Number 1318971 and the Samsung Global Research Program 2013 under Theme “Big Data and Network,” Subject “Privacy and Trust Management In Big Data Analysis.” This work was partially sponsored by the Army Research Laboratory under Cooperative Agreement Number W911NF-09-2-0053.
Publisher Copyright:
© 2015, Springer-Verlag London.
PY - 2016/9/1
Y1 - 2016/9/1
N2 - The problem of similarity learning is relevant to many data mining applications, such as recommender systems, classification, and retrieval. This problem is particularly challenging in the context of networks, which contain different aspects such as the topological structure, content, and user supervision. These different aspects need to be combined effectively, in order to create a holistic similarity function. In particular, while most similarity learning methods in networks such as SimRank utilize the topological structure, the user supervision and content are rarely considered. In this paper, a factorized similarity learning (FSL) is proposed to integrate the link, node content, and user supervision into a uniform framework. This is learned by using matrix factorization, and the final similarities are approximated by the span of low-rank matrices. The proposed framework is further extended to a noise-tolerant version by adopting a hinge loss alternatively. To facilitate efficient computation on large-scale data, a parallel extension is developed. Experiments are conducted on the DBLP and CoRA data sets. The results show that FSL is robust and efficient and outperforms the state of the art. The code for the learning algorithm used in our experiments is available at http://www.ifp.illinois.edu/~chang87/.
AB - The problem of similarity learning is relevant to many data mining applications, such as recommender systems, classification, and retrieval. This problem is particularly challenging in the context of networks, which contain different aspects such as the topological structure, content, and user supervision. These different aspects need to be combined effectively, in order to create a holistic similarity function. In particular, while most similarity learning methods in networks such as SimRank utilize the topological structure, the user supervision and content are rarely considered. In this paper, a factorized similarity learning (FSL) is proposed to integrate the link, node content, and user supervision into a uniform framework. This is learned by using matrix factorization, and the final similarities are approximated by the span of low-rank matrices. The proposed framework is further extended to a noise-tolerant version by adopting a hinge loss alternatively. To facilitate efficient computation on large-scale data, a parallel extension is developed. Experiments are conducted on the DBLP and CoRA data sets. The results show that FSL is robust and efficient and outperforms the state of the art. The code for the learning algorithm used in our experiments is available at http://www.ifp.illinois.edu/~chang87/.
KW - Large-scale network
KW - Link content consistency
KW - Supervised matrix factorization
KW - Supervised network embedding
KW - Supervised network similarity learning
UR - http://www.scopus.com/inward/record.url?scp=84944916452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84944916452&partnerID=8YFLogxK
U2 - 10.1007/s10115-015-0894-8
DO - 10.1007/s10115-015-0894-8
M3 - Article
AN - SCOPUS:84944916452
SN - 0219-1377
VL - 48
SP - 707
EP - 740
JO - Knowledge and Information Systems
JF - Knowledge and Information Systems
IS - 3
ER -