### Abstract

Linear regression is a widely used tool in data mining and machine learning. In many applications, fitting a regression model with only linear effects may not be sufficient for predictive or explanatory purposes. One strategy which has recently received increasing attention in statistics is to include feature interactions to capture the nonlinearity in the regression model. Such model has been applied successfully in many biomedical applications. One major challenge in the use of such model is that the data dimensionality is significantly higher than the original data, resulting in the small sample size large dimension problem. Recently, weak hierarchical Lasso, a sparse interaction regression model, is proposed that produces sparse and hierarchical structured estimator by exploiting the Lasso penalty and a set of hierarchical constraints. However, the hierarchical constraints make it a non-convex problem and the existing method finds the solution of its convex relaxation, which needs additional conditions to guarantee the hierarchical structure. In this paper, we propose to directly solve the non-convex weak hierarchical Lasso by making use of the GIST (General Iterative Shrinkage and Thresholding) optimization framework which has been shown to be efficient for solving non-convex sparse formulations. The key step in GIST is to compute a sequence of proximal operators. One of our key technical contributions is to show that the proximal operator associated with the non-convex weak hierarchical Lasso admits a closed form solution. However, a naive approach for solving each subproblem of the proximal operator leads to a quadratic time complexity, which is not desirable for large size problems. To this end, we further develop an efficient algorithm for computing the subproblems with a linearithmic time complexity. We have conducted extensive experiments on both synthetic and real data sets. Results show that our proposed algorithm is much more efficient and effective than its convex relaxation.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |

Publisher | Association for Computing Machinery |

Pages | 283-292 |

Number of pages | 10 |

ISBN (Print) | 9781450329569 |

DOIs | |

State | Published - 2014 |

Event | 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 - New York, NY, United States Duration: Aug 24 2014 → Aug 27 2014 |

### Other

Other | 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 |
---|---|

Country | United States |

City | New York, NY |

Period | 8/24/14 → 8/27/14 |

### Fingerprint

### Keywords

- non-convex
- proximal operator
- sparse learning
- weak hierarchical lasso

### ASJC Scopus subject areas

- Software
- Information Systems

### Cite this

*Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*(pp. 283-292). Association for Computing Machinery. https://doi.org/10.1145/2623330.2623665

**An efficient algorithm for weak hierarchical Lasso.** / Liu, Yashu; Wang, Jie; Ye, Jieping.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.*Association for Computing Machinery, pp. 283-292, 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, United States, 8/24/14. https://doi.org/10.1145/2623330.2623665

}

TY - GEN

T1 - An efficient algorithm for weak hierarchical Lasso

AU - Liu, Yashu

AU - Wang, Jie

AU - Ye, Jieping

PY - 2014

Y1 - 2014

N2 - Linear regression is a widely used tool in data mining and machine learning. In many applications, fitting a regression model with only linear effects may not be sufficient for predictive or explanatory purposes. One strategy which has recently received increasing attention in statistics is to include feature interactions to capture the nonlinearity in the regression model. Such model has been applied successfully in many biomedical applications. One major challenge in the use of such model is that the data dimensionality is significantly higher than the original data, resulting in the small sample size large dimension problem. Recently, weak hierarchical Lasso, a sparse interaction regression model, is proposed that produces sparse and hierarchical structured estimator by exploiting the Lasso penalty and a set of hierarchical constraints. However, the hierarchical constraints make it a non-convex problem and the existing method finds the solution of its convex relaxation, which needs additional conditions to guarantee the hierarchical structure. In this paper, we propose to directly solve the non-convex weak hierarchical Lasso by making use of the GIST (General Iterative Shrinkage and Thresholding) optimization framework which has been shown to be efficient for solving non-convex sparse formulations. The key step in GIST is to compute a sequence of proximal operators. One of our key technical contributions is to show that the proximal operator associated with the non-convex weak hierarchical Lasso admits a closed form solution. However, a naive approach for solving each subproblem of the proximal operator leads to a quadratic time complexity, which is not desirable for large size problems. To this end, we further develop an efficient algorithm for computing the subproblems with a linearithmic time complexity. We have conducted extensive experiments on both synthetic and real data sets. Results show that our proposed algorithm is much more efficient and effective than its convex relaxation.

AB - Linear regression is a widely used tool in data mining and machine learning. In many applications, fitting a regression model with only linear effects may not be sufficient for predictive or explanatory purposes. One strategy which has recently received increasing attention in statistics is to include feature interactions to capture the nonlinearity in the regression model. Such model has been applied successfully in many biomedical applications. One major challenge in the use of such model is that the data dimensionality is significantly higher than the original data, resulting in the small sample size large dimension problem. Recently, weak hierarchical Lasso, a sparse interaction regression model, is proposed that produces sparse and hierarchical structured estimator by exploiting the Lasso penalty and a set of hierarchical constraints. However, the hierarchical constraints make it a non-convex problem and the existing method finds the solution of its convex relaxation, which needs additional conditions to guarantee the hierarchical structure. In this paper, we propose to directly solve the non-convex weak hierarchical Lasso by making use of the GIST (General Iterative Shrinkage and Thresholding) optimization framework which has been shown to be efficient for solving non-convex sparse formulations. The key step in GIST is to compute a sequence of proximal operators. One of our key technical contributions is to show that the proximal operator associated with the non-convex weak hierarchical Lasso admits a closed form solution. However, a naive approach for solving each subproblem of the proximal operator leads to a quadratic time complexity, which is not desirable for large size problems. To this end, we further develop an efficient algorithm for computing the subproblems with a linearithmic time complexity. We have conducted extensive experiments on both synthetic and real data sets. Results show that our proposed algorithm is much more efficient and effective than its convex relaxation.

KW - non-convex

KW - proximal operator

KW - sparse learning

KW - weak hierarchical lasso

UR - http://www.scopus.com/inward/record.url?scp=84907029325&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907029325&partnerID=8YFLogxK

U2 - 10.1145/2623330.2623665

DO - 10.1145/2623330.2623665

M3 - Conference contribution

SN - 9781450329569

SP - 283

EP - 292

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -