Abstract
In this paper, we consider the problem of learning from multiple related tasks for improved generalization performance by extracting their shared structures. The alternating structure optimization (ASO) algorithm, which couples all tasks using a shared feature representation, has been successfully applied in various multitask learning problems. However, ASO is nonconvex and the alternating algorithm only finds a local solution. We first present an improved ASO formulation ((iASO)) for multitask learning based on a new regularizer. We then convert (iASO), a nonconvex formulation, into a relaxed convex one ((rASO)). Interestingly, our theoretical analysis reveals that (rASO) finds a globally optimal solution to its nonconvex counterpart (iASO) under certain conditions. (rASO) can be equivalently reformulated as a semidefinite program (SDP), which is, however, not scalable to large datasets. We propose to employ the block coordinate descent (BCD) method and the accelerated projected gradient (APG) algorithm separately to find the globally optimal solution to (rASO); we also develop efficient algorithms for solving the key subproblems involved in BCD and APG. The experiments on the Yahoo webpages datasets and the Drosophila gene expression pattern images datasets demonstrate the effectiveness and efficiency of the proposed algorithms and confirm our theoretical analysis.
Original language | English (US) |
---|---|
Article number | 6296661 |
Pages (from-to) | 1025-1035 |
Number of pages | 11 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Volume | 35 |
Issue number | 5 |
DOIs | |
State | Published - 2013 |
Keywords
- Multitask learning
- accelerated projected gradient
- alternating structure optimization
- shared predictive structure
ASJC Scopus subject areas
- Software
- Computer Vision and Pattern Recognition
- Computational Theory and Mathematics
- Artificial Intelligence
- Applied Mathematics