Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis

Liang Sun, Shuiwang Ji, Jieping Ye

Research output: Contribution to journalArticlepeer-review

265 Scopus citations

Abstract

Canonical Correlation Analysis (CCA) is a well-known technique for finding the correlations between two sets of multidimensional variables. It projects both sets of variables onto a lower-dimensional space in which they are maximally correlated. CCA is commonly applied for supervised dimensionality reduction in which the two sets of variables are derived from the data and the class labels, respectively. It is well-known that CCA can be formulated as a least-squares problem in the binary class case. However, the extension to the more general setting remains unclear. In this paper, we show that under a mild condition which tends to hold for high-dimensional data, CCA in the multilabel case can be formulated as a least-squares problem. Based on this equivalence relationship, efficient algorithms for solving least-squares problems can be applied to scale CCA to very large data sets. In addition, we propose several CCA extensions, including the sparse CCA formulation based on the 1-norm regularization. We further extend the least-squares formulation to partial least squares. In addition, we show that the CCA projection for one set of variables is independent of the regularization on the other set of multidimensional variables, providing new insights on the effect of regularization on CCA. We have conducted experiments using benchmark data sets. Experiments on multilabel data sets confirm the established equivalence relationships. Results also demonstrate the effectiveness and efficiency of the proposed CCA extensions.

Original languageEnglish (US)
Article number5557883
Pages (from-to)194-200
Number of pages7
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Volume33
Issue number1
DOIs
StatePublished - 2011

Keywords

  • Canonical correlation analysis
  • least squares
  • multilabel learning
  • partial least squares
  • regularization

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Computational Theory and Mathematics
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Canonical correlation analysis for multilabel classification: A least-squares formulation, extensions, and analysis'. Together they form a unique fingerprint.

Cite this