Multi-task feature learning via efficient ℓ2, 1-norm minimization

Jun Liu, Shuiwang Ji, Jieping Ye

Research output: Chapter in Book/Report/Conference proceedingConference contribution

525 Scopus citations

Abstract

The problem of joint feature selection across a group of related tasks has applications in many areas including biomedical informatics and computer vision. We consider the ℓ2, 1-norm regularized regression model for joint feature selection from multiple tasks, which can be derived in the probabilistic framework by assuming a suitable prior from the exponential family. One appealing feature of the ℓ2, 1-norm regularization is that it encourages multiple predictors to share similar sparsity patterns. However, the resulting optimization problem is challenging to solve due to the non-smoothness of the ℓ2, 1-norm regularization. In this paper, we propose to accelerate the computation by reformulating it as two equivalent smooth convex optimization problems which are then solved via the Nesterov's method-an optimal first-order black-box method for smooth convex optimization. A key building block in solving the reformulations is the Euclidean projection. We show that the Euclidean projection for the first reformulation can be analytically computed, while the Euclidean projection for the second one can be computed in linear time. Empirical evaluations on several data sets verify the efficiency of the proposed algorithms.

Original languageEnglish (US)
Title of host publicationProceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009
Pages339-348
Number of pages10
StatePublished - 2009
Event25th Conference on Uncertainty in Artificial Intelligence, UAI 2009 - Montreal, QC, Canada
Duration: Jun 18 2009Jun 21 2009

Other

Other25th Conference on Uncertainty in Artificial Intelligence, UAI 2009
CountryCanada
CityMontreal, QC
Period6/18/096/21/09

ASJC Scopus subject areas

  • Artificial Intelligence
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Multi-task feature learning via efficient ℓ<sub>2, 1</sub>-norm minimization'. Together they form a unique fingerprint.

Cite this