TY - JOUR
T1 - IDR/QR
T2 - An incremental dimension reduction algorithm via QR decomposition
AU - Ye, Jieping
AU - Li, Qi
AU - Xiong, Hui
AU - Park, Haesun
AU - Janardan, Ravi
AU - Kumar, Vipin
N1 - Funding Information:
This research was sponsored, in part, by the Army High Performance Computing Research Center under the auspices of the US Department of the Army, Army Research Laboratory cooperative agreement number DAAD19-01-2-0014, and the US National Science Foundation Grants CCR-0204109, ACI-0305543, IIS-0308264, and DOE/LLNL W-7045-ENG-48. The work of Haesun Park was performed while at the National Science Foundation (NSF) and was partly supported by IR/D from NSF. The content of this work does not necessarily reflect the position or policy of the government and the NSF, and no official endorsement should be inferred. Access to computing facilities was provided by the AHPCRC and the Minnesota Supercomputing Institute. A preliminary version of this paper appears in the Proceedings of the 10th
PY - 2005/9
Y1 - 2005/9
N2 - Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which-applies QR Decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.
AB - Dimension reduction is a critical data preprocessing step for many database and data mining applications, such as efficient storage and retrieval of high-dimensional data. In the literature, a well-known dimension reduction algorithm is Linear Discriminant Analysis (LDA). The common aspect of previously proposed LDA-based algorithms is the use of Singular Value Decomposition (SVD). Due to the difficulty of designing an incremental solution for the eigenvalue problem on the product of scatter matrices in LDA, there has been little work on designing incremental LDA algorithms that can efficiently incorporate new data items as they become available. In this paper, we propose an LDA-based incremental dimension reduction algorithm, called IDR/QR, which-applies QR Decomposition rather than SVD. Unlike other LDA-based algorithms, this algorithm does not require the whole data matrix in main memory. This is desirable for large data sets. More importantly, with the insertion of new data items, the IDR/QR algorithm can constrain the computational cost by applying efficient QR-updating techniques. Finally, we evaluate the effectiveness of the IDR/QR algorithm in terms of classification error rate on the reduced dimensional space. Our experiments on several real-world data sets reveal that the classification error rate achieved by the IDR/QR algorithm is very close to the best possible one achieved by other LDA-based algorithms. However, the IDR/QR algorithm has much less computational cost, especially when new data items are inserted dynamically.
KW - Dimension reduction
KW - Incremental learning
KW - Linear discriminant analysis
KW - QR decomposition
KW - Singular value decomposition (SVD)
UR - http://www.scopus.com/inward/record.url?scp=33947660936&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33947660936&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2005.148
DO - 10.1109/TKDE.2005.148
M3 - Article
AN - SCOPUS:33947660936
SN - 1041-4347
VL - 17
SP - 1208
EP - 1221
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 9
ER -