On similarity preserving feature selection

Zheng Zhao; Lei Wang; Huan Liu; Jieping Ye

doi:10.1109/TKDE.2011.222

On similarity preserving feature selection

Zheng Zhao, Lei Wang, Huan Liu, Jieping Ye

Research output: Contribution to journal › Article › peer-review

272 Scopus citations

Abstract

In the literature of feature selection, different criteria have been proposed to evaluate the goodness of features. In our investigation, we notice that a number of existing selection criteria implicitly select features that preserve sample similarity, and can be unified under a common framework. We further point out that any feature selection criteria covered by this framework cannot handle redundant features, a common drawback of these criteria. Motivated by these observations, we propose a new 'Similarity Preserving Feature Selection framework in an explicit and rigorous way. We show, through theoretical analysis, that the proposed framework not only encompasses many widely used feature selection criteria, but also naturally overcomes their common weakness in handling feature redundancy. In developing this new framework, we begin with a conventional combinatorial optimization formulation for similarity preserving feature selection, then extend it with a sparse multiple-output regression formulation to improve its efficiency and effectiveness. A set of three algorithms are devised to efficiently solve the proposed formulations, each of which has its own advantages in terms of computational complexity and selection performance. As exhibited by our extensive experimental study, the proposed framework achieves superior feature selection performance and attractive properties.

Original language	English (US)
Article number	6051436
Pages (from-to)	619-632
Number of pages	14
Journal	IEEE Transactions on Knowledge and Data Engineering
Volume	25
Issue number	3
DOIs	https://doi.org/10.1109/TKDE.2011.222
State	Published - 2013

Keywords

Feature selection
multiple output regression
redundancy removal
similarity preserving
sparse regularization

ASJC Scopus subject areas

Information Systems
Computer Science Applications
Computational Theory and Mathematics

Access to Document

10.1109/TKDE.2011.222

Cite this

@article{51dfa9fa7daf49c38fbfabc6c5323078,

title = "On similarity preserving feature selection",

abstract = "In the literature of feature selection, different criteria have been proposed to evaluate the goodness of features. In our investigation, we notice that a number of existing selection criteria implicitly select features that preserve sample similarity, and can be unified under a common framework. We further point out that any feature selection criteria covered by this framework cannot handle redundant features, a common drawback of these criteria. Motivated by these observations, we propose a new 'Similarity Preserving Feature Selection framework in an explicit and rigorous way. We show, through theoretical analysis, that the proposed framework not only encompasses many widely used feature selection criteria, but also naturally overcomes their common weakness in handling feature redundancy. In developing this new framework, we begin with a conventional combinatorial optimization formulation for similarity preserving feature selection, then extend it with a sparse multiple-output regression formulation to improve its efficiency and effectiveness. A set of three algorithms are devised to efficiently solve the proposed formulations, each of which has its own advantages in terms of computational complexity and selection performance. As exhibited by our extensive experimental study, the proposed framework achieves superior feature selection performance and attractive properties.",

keywords = "Feature selection, multiple output regression, redundancy removal, similarity preserving, sparse regularization",

author = "Zheng Zhao and Lei Wang and Huan Liu and Jieping Ye",

year = "2013",

doi = "10.1109/TKDE.2011.222",

language = "English (US)",

volume = "25",

pages = "619--632",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "3",

}

TY - JOUR

T1 - On similarity preserving feature selection

AU - Zhao, Zheng

AU - Wang, Lei

AU - Liu, Huan

AU - Ye, Jieping

PY - 2013

Y1 - 2013

N2 - In the literature of feature selection, different criteria have been proposed to evaluate the goodness of features. In our investigation, we notice that a number of existing selection criteria implicitly select features that preserve sample similarity, and can be unified under a common framework. We further point out that any feature selection criteria covered by this framework cannot handle redundant features, a common drawback of these criteria. Motivated by these observations, we propose a new 'Similarity Preserving Feature Selection framework in an explicit and rigorous way. We show, through theoretical analysis, that the proposed framework not only encompasses many widely used feature selection criteria, but also naturally overcomes their common weakness in handling feature redundancy. In developing this new framework, we begin with a conventional combinatorial optimization formulation for similarity preserving feature selection, then extend it with a sparse multiple-output regression formulation to improve its efficiency and effectiveness. A set of three algorithms are devised to efficiently solve the proposed formulations, each of which has its own advantages in terms of computational complexity and selection performance. As exhibited by our extensive experimental study, the proposed framework achieves superior feature selection performance and attractive properties.

AB - In the literature of feature selection, different criteria have been proposed to evaluate the goodness of features. In our investigation, we notice that a number of existing selection criteria implicitly select features that preserve sample similarity, and can be unified under a common framework. We further point out that any feature selection criteria covered by this framework cannot handle redundant features, a common drawback of these criteria. Motivated by these observations, we propose a new 'Similarity Preserving Feature Selection framework in an explicit and rigorous way. We show, through theoretical analysis, that the proposed framework not only encompasses many widely used feature selection criteria, but also naturally overcomes their common weakness in handling feature redundancy. In developing this new framework, we begin with a conventional combinatorial optimization formulation for similarity preserving feature selection, then extend it with a sparse multiple-output regression formulation to improve its efficiency and effectiveness. A set of three algorithms are devised to efficiently solve the proposed formulations, each of which has its own advantages in terms of computational complexity and selection performance. As exhibited by our extensive experimental study, the proposed framework achieves superior feature selection performance and attractive properties.

KW - Feature selection

KW - multiple output regression

KW - redundancy removal

KW - similarity preserving

KW - sparse regularization

UR - http://www.scopus.com/inward/record.url?scp=84873278481&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84873278481&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2011.222

DO - 10.1109/TKDE.2011.222

M3 - Article

AN - SCOPUS:84873278481

SN - 1041-4347

VL - 25

SP - 619

EP - 632

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 3

M1 - 6051436

ER -

On similarity preserving feature selection

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this