TY - GEN
T1 - GI-NMF
T2 - 23rd ACM International Conference on Information and Knowledge Management, CIKM 2014
AU - Chen, Xilun
AU - Candan, Kasim
N1 - Publisher Copyright:
Copyright 2014 ACM.
PY - 2014/11/3
Y1 - 2014/11/3
N2 - Non-negative matrix factorization (NMF) is a well known method for obtaining low rank approximations of data sets, which can then be used for efficient indexing, classification, and retrieval. The non-negativity constraints enable probabilistic interpretation of the results and discovery of generative models. One key disadvantage of the NMF, however, is that it is costly to obtain and this makes it difficult to apply NMF in applications where data is dynamic. In this paper, we recognize that many applications involve redundancies and we argue that these redundancies can and should be leveraged for reducing the computational cost of the NMF process: Firstly, online applications involving data streams often include temporal redundancies. Secondly, and perhaps less obviously, many applications include integration of multiple data streams (with potential overlaps) and/or involves tracking of multiple similar (but different) queries; this leads to significant data and query redundancies, which if leveraged properly can help alleviate computational cost of NMF. Based on these observations, we introduce Group Incremental Non-Negative Matrix Factorization (GI-NMF) which leverages redundancies across multiple NMF tasks over data streams. The proposed algorithm relies on a novel group multiplicative update rules (G-MUR) method to significantly reduce the cost of NMF. G-MUR is further complemented to support incremental update of the factors where data evolves continuously. Experiments show that GI-NMF significantly reduces the processing time, with minimal error overhead.
AB - Non-negative matrix factorization (NMF) is a well known method for obtaining low rank approximations of data sets, which can then be used for efficient indexing, classification, and retrieval. The non-negativity constraints enable probabilistic interpretation of the results and discovery of generative models. One key disadvantage of the NMF, however, is that it is costly to obtain and this makes it difficult to apply NMF in applications where data is dynamic. In this paper, we recognize that many applications involve redundancies and we argue that these redundancies can and should be leveraged for reducing the computational cost of the NMF process: Firstly, online applications involving data streams often include temporal redundancies. Secondly, and perhaps less obviously, many applications include integration of multiple data streams (with potential overlaps) and/or involves tracking of multiple similar (but different) queries; this leads to significant data and query redundancies, which if leveraged properly can help alleviate computational cost of NMF. Based on these observations, we introduce Group Incremental Non-Negative Matrix Factorization (GI-NMF) which leverages redundancies across multiple NMF tasks over data streams. The proposed algorithm relies on a novel group multiplicative update rules (G-MUR) method to significantly reduce the cost of NMF. G-MUR is further complemented to support incremental update of the factors where data evolves continuously. Experiments show that GI-NMF significantly reduces the processing time, with minimal error overhead.
UR - http://www.scopus.com/inward/record.url?scp=84937566005&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937566005&partnerID=8YFLogxK
U2 - 10.1145/2661829.2662008
DO - 10.1145/2661829.2662008
M3 - Conference contribution
AN - SCOPUS:84937566005
T3 - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
SP - 1119
EP - 1128
BT - CIKM 2014 - Proceedings of the 2014 ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 3 November 2014 through 7 November 2014
ER -