Simultaneous feature and feature group selection through hard thresholding

Shuo Xiang; Tao Yang; Jieping Ye

doi:10.1145/2623330.2623662

Simultaneous feature and feature group selection through hard thresholding

Shuo Xiang, Tao Yang, Jieping Ye

Computing and Augmented Intelligence, School of (IAFSE-SCAI)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

18 Scopus citations

Abstract

Selecting an informative subset of features has important applications in many data mining tasks especially for high-dimensional data. Recently, simultaneous selection of features and feature groups (a.k.a bi-level selection) becomes increasingly popular since it not only reduces the number of features but also unveils the underlying grouping effect in the data, which is a valuable functionality in many applications such as bioinformatics and web data mining. One major challenge of bi-level selection (or even feature selection only) is that computing a globally optimal solution requires a prohibitive computational cost. To overcome such a challenge, current research mainly falls into two categories. The first one focuses on finding suitable continuous computational surrogates for the discrete functions and this leads to various convex and nonconvex optimization models. Although efficient, convex models usually deliver sub-optimal performance while nonconvex models on the other hand require significantly more computational effort. Another direction is to use greedy algorithms to solve the discrete optimization directly. However, existing algorithms are proposed to handle single-level selection only and it remains challenging to extend these methods to handle bi-level selection. In this paper, we fulfill this gap by introducing an efficient sparse group hard thresholding algorithm. Our main contributions are: (1) we propose a novel bi-level selection model and show that the key combinatorial problem admits a globally optimal solution using dynamic programming; (2) we provide an error bound between our solution and the globally optimal under the RIP (Restricted Isometry Property) theoretical framework. Our experiments on synthetic and real data demonstrate that the proposed algorithm produces encouraging performance while keeping comparable computational efficiency to convex relaxation models.

Original language	English (US)
Title of host publication	KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publisher	Association for Computing Machinery
Pages	532-541
Number of pages	10
ISBN (Print)	9781450329569
DOIs	https://doi.org/10.1145/2623330.2623662
State	Published - 2014
Event	20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 - New York, NY, United States Duration: Aug 24 2014 → Aug 27 2014

Publication series

Name	Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Conference

Conference	20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014
Country/Territory	United States
City	New York, NY
Period	8/24/14 → 8/27/14

Keywords

bi-level learning
com- binatorics
dynamic programming
feature selection
optimization
supervised learning

ASJC Scopus subject areas

Software
Information Systems

Access to Document

10.1145/2623330.2623662

Cite this

Xiang, S., Yang, T., & Ye, J. (2014). Simultaneous feature and feature group selection through hard thresholding. In KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 532-541). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery. https://doi.org/10.1145/2623330.2623662

Simultaneous feature and feature group selection through hard thresholding. / Xiang, Shuo; Yang, Tao; Ye, Jieping.
KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2014. p. 532-541 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Xiang, S, Yang, T & Ye, J 2014, Simultaneous feature and feature group selection through hard thresholding. in KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, pp. 532-541, 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014, New York, NY, United States, 8/24/14. https://doi.org/10.1145/2623330.2623662

Xiang S, Yang T, Ye J. Simultaneous feature and feature group selection through hard thresholding. In KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2014. p. 532-541. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). doi: 10.1145/2623330.2623662

Xiang, Shuo ; Yang, Tao ; Ye, Jieping. / Simultaneous feature and feature group selection through hard thresholding. KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2014. pp. 532-541 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

@inproceedings{3f1111542aa24dba9d77a3890561a019,

title = "Simultaneous feature and feature group selection through hard thresholding",

abstract = "Selecting an informative subset of features has important applications in many data mining tasks especially for high-dimensional data. Recently, simultaneous selection of features and feature groups (a.k.a bi-level selection) becomes increasingly popular since it not only reduces the number of features but also unveils the underlying grouping effect in the data, which is a valuable functionality in many applications such as bioinformatics and web data mining. One major challenge of bi-level selection (or even feature selection only) is that computing a globally optimal solution requires a prohibitive computational cost. To overcome such a challenge, current research mainly falls into two categories. The first one focuses on finding suitable continuous computational surrogates for the discrete functions and this leads to various convex and nonconvex optimization models. Although efficient, convex models usually deliver sub-optimal performance while nonconvex models on the other hand require significantly more computational effort. Another direction is to use greedy algorithms to solve the discrete optimization directly. However, existing algorithms are proposed to handle single-level selection only and it remains challenging to extend these methods to handle bi-level selection. In this paper, we fulfill this gap by introducing an efficient sparse group hard thresholding algorithm. Our main contributions are: (1) we propose a novel bi-level selection model and show that the key combinatorial problem admits a globally optimal solution using dynamic programming; (2) we provide an error bound between our solution and the globally optimal under the RIP (Restricted Isometry Property) theoretical framework. Our experiments on synthetic and real data demonstrate that the proposed algorithm produces encouraging performance while keeping comparable computational efficiency to convex relaxation models.",

keywords = "bi-level learning, com- binatorics, dynamic programming, feature selection, optimization, supervised learning",

author = "Shuo Xiang and Tao Yang and Jieping Ye",

year = "2014",

doi = "10.1145/2623330.2623662",

language = "English (US)",

isbn = "9781450329569",

series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

publisher = "Association for Computing Machinery",

pages = "532--541",

booktitle = "KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

note = "20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014 ; Conference date: 24-08-2014 Through 27-08-2014",

}

TY - GEN

T1 - Simultaneous feature and feature group selection through hard thresholding

AU - Xiang, Shuo

AU - Yang, Tao

AU - Ye, Jieping

PY - 2014

Y1 - 2014

N2 - Selecting an informative subset of features has important applications in many data mining tasks especially for high-dimensional data. Recently, simultaneous selection of features and feature groups (a.k.a bi-level selection) becomes increasingly popular since it not only reduces the number of features but also unveils the underlying grouping effect in the data, which is a valuable functionality in many applications such as bioinformatics and web data mining. One major challenge of bi-level selection (or even feature selection only) is that computing a globally optimal solution requires a prohibitive computational cost. To overcome such a challenge, current research mainly falls into two categories. The first one focuses on finding suitable continuous computational surrogates for the discrete functions and this leads to various convex and nonconvex optimization models. Although efficient, convex models usually deliver sub-optimal performance while nonconvex models on the other hand require significantly more computational effort. Another direction is to use greedy algorithms to solve the discrete optimization directly. However, existing algorithms are proposed to handle single-level selection only and it remains challenging to extend these methods to handle bi-level selection. In this paper, we fulfill this gap by introducing an efficient sparse group hard thresholding algorithm. Our main contributions are: (1) we propose a novel bi-level selection model and show that the key combinatorial problem admits a globally optimal solution using dynamic programming; (2) we provide an error bound between our solution and the globally optimal under the RIP (Restricted Isometry Property) theoretical framework. Our experiments on synthetic and real data demonstrate that the proposed algorithm produces encouraging performance while keeping comparable computational efficiency to convex relaxation models.

AB - Selecting an informative subset of features has important applications in many data mining tasks especially for high-dimensional data. Recently, simultaneous selection of features and feature groups (a.k.a bi-level selection) becomes increasingly popular since it not only reduces the number of features but also unveils the underlying grouping effect in the data, which is a valuable functionality in many applications such as bioinformatics and web data mining. One major challenge of bi-level selection (or even feature selection only) is that computing a globally optimal solution requires a prohibitive computational cost. To overcome such a challenge, current research mainly falls into two categories. The first one focuses on finding suitable continuous computational surrogates for the discrete functions and this leads to various convex and nonconvex optimization models. Although efficient, convex models usually deliver sub-optimal performance while nonconvex models on the other hand require significantly more computational effort. Another direction is to use greedy algorithms to solve the discrete optimization directly. However, existing algorithms are proposed to handle single-level selection only and it remains challenging to extend these methods to handle bi-level selection. In this paper, we fulfill this gap by introducing an efficient sparse group hard thresholding algorithm. Our main contributions are: (1) we propose a novel bi-level selection model and show that the key combinatorial problem admits a globally optimal solution using dynamic programming; (2) we provide an error bound between our solution and the globally optimal under the RIP (Restricted Isometry Property) theoretical framework. Our experiments on synthetic and real data demonstrate that the proposed algorithm produces encouraging performance while keeping comparable computational efficiency to convex relaxation models.

KW - bi-level learning

KW - com- binatorics

KW - dynamic programming

KW - feature selection

KW - optimization

KW - supervised learning

UR - http://www.scopus.com/inward/record.url?scp=84907033131&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84907033131&partnerID=8YFLogxK

U2 - 10.1145/2623330.2623662

DO - 10.1145/2623330.2623662

M3 - Conference contribution

AN - SCOPUS:84907033131

SN - 9781450329569

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 532

EP - 541

BT - KDD 2014 - Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

T2 - 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2014

Y2 - 24 August 2014 through 27 August 2014

ER -

Simultaneous feature and feature group selection through hard thresholding

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this