TY - GEN
T1 - Regularized discriminant analysis for high dimensional, low sample size data
AU - Ye, Jieping
AU - Wang, Tie
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2006
Y1 - 2006
N2 - Linear and Quadratic Discriminant Analysis have been used widely in many areas of data mining, machine learning, and bioinformatics. Friedman proposed a compromise between Linear and Quadratic Discriminant Analysis, called Regularized Discriminant Analysis (RDA), which has been shown to be more flexible in dealing with various class distributions. RDA applies the regularization techniques by employing two regularization parameters, which are chosen to jointly maximize the classification performance. The optimal pair of parameters is commonly estimated via crossvalidation from a set of candidate pairs. It is computationally prohibitive for high dimensional data, especially when the candidate set is large, which limits the applications of RDA to low dimensional data. In this paper, a novel algorithm for RDA is presented for high dimensional data. It can estimate the optimal regularization parameters from a large set of parameter candidates efficiently. Experiments on a variety of datasets confirm the claimed theoretical estimate of the efficiency, and also show that, for a properly chosen pair of regularization parameters, RDA performs favorably in classification, in comparison with other existing classification methods.
AB - Linear and Quadratic Discriminant Analysis have been used widely in many areas of data mining, machine learning, and bioinformatics. Friedman proposed a compromise between Linear and Quadratic Discriminant Analysis, called Regularized Discriminant Analysis (RDA), which has been shown to be more flexible in dealing with various class distributions. RDA applies the regularization techniques by employing two regularization parameters, which are chosen to jointly maximize the classification performance. The optimal pair of parameters is commonly estimated via crossvalidation from a set of candidate pairs. It is computationally prohibitive for high dimensional data, especially when the candidate set is large, which limits the applications of RDA to low dimensional data. In this paper, a novel algorithm for RDA is presented for high dimensional data. It can estimate the optimal regularization parameters from a large set of parameter candidates efficiently. Experiments on a variety of datasets confirm the claimed theoretical estimate of the efficiency, and also show that, for a properly chosen pair of regularization parameters, RDA performs favorably in classification, in comparison with other existing classification methods.
KW - Cross-validation
KW - Dimensionality reduction
KW - Quadratic Discriminant Analysis
KW - Regularization
UR - http://www.scopus.com/inward/record.url?scp=33749543648&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33749543648&partnerID=8YFLogxK
U2 - 10.1145/1150402.1150453
DO - 10.1145/1150402.1150453
M3 - Conference contribution
AN - SCOPUS:33749543648
SN - 1595933395
SN - 9781595933393
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 454
EP - 463
BT - KDD 2006
PB - Association for Computing Machinery
T2 - KDD 2006: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Y2 - 20 August 2006 through 23 August 2006
ER -