Dimensionality reduction of unsupervised data

M. Dash; Huan Liu; J. Yao

Dimensionality reduction of unsupervised data

M. Dash, Huan Liu, J. Yao

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Dimensionality reduction is an important problem for efficient handling of large databases. Many feature selection methods exist for supervised data having class information. Little work has been done for dimensionality reduction of unsupervised data in which class information is not available. Principal Component Analysis (PCA) is often used. However, PCA creates new features. It is difficult to obtain intuitive understanding of the data using the new features only. In this paper we are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature from the feature set may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct extensive experiments to show that our method is able to find the important features. Also it compares well with a similar feature ranking method (Relief) that requires class information unlike our method.

Original language	English (US)
Title of host publication	Proceedings of the International Conference on Tools with Artificial Intelligence
Editors	Anon
Publisher	IEEE
Pages	532-539
Number of pages	8
State	Published - 1997
Externally published	Yes
Event	Proceedings if the 1997 IEEE 9th IEEE International Conference on Tools with Artificial Intelligence - Newport Beach, CA, USA Duration: Nov 3 1997 → Nov 8 1997

Other

Other	Proceedings if the 1997 IEEE 9th IEEE International Conference on Tools with Artificial Intelligence
City	Newport Beach, CA, USA
Period	11/3/97 → 11/8/97

ASJC Scopus subject areas

Software

Cite this

@inproceedings{4ded2bb23b14492a88a9c8372648bd51,

title = "Dimensionality reduction of unsupervised data",

abstract = "Dimensionality reduction is an important problem for efficient handling of large databases. Many feature selection methods exist for supervised data having class information. Little work has been done for dimensionality reduction of unsupervised data in which class information is not available. Principal Component Analysis (PCA) is often used. However, PCA creates new features. It is difficult to obtain intuitive understanding of the data using the new features only. In this paper we are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature from the feature set may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct extensive experiments to show that our method is able to find the important features. Also it compares well with a similar feature ranking method (Relief) that requires class information unlike our method.",

author = "M. Dash and Huan Liu and J. Yao",

year = "1997",

language = "English (US)",

pages = "532--539",

editor = "Anon",

booktitle = "Proceedings of the International Conference on Tools with Artificial Intelligence",

publisher = "IEEE",

note = "Proceedings if the 1997 IEEE 9th IEEE International Conference on Tools with Artificial Intelligence ; Conference date: 03-11-1997 Through 08-11-1997",

}

TY - GEN

T1 - Dimensionality reduction of unsupervised data

AU - Dash, M.

AU - Liu, Huan

AU - Yao, J.

PY - 1997

Y1 - 1997

N2 - Dimensionality reduction is an important problem for efficient handling of large databases. Many feature selection methods exist for supervised data having class information. Little work has been done for dimensionality reduction of unsupervised data in which class information is not available. Principal Component Analysis (PCA) is often used. However, PCA creates new features. It is difficult to obtain intuitive understanding of the data using the new features only. In this paper we are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature from the feature set may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct extensive experiments to show that our method is able to find the important features. Also it compares well with a similar feature ranking method (Relief) that requires class information unlike our method.

AB - Dimensionality reduction is an important problem for efficient handling of large databases. Many feature selection methods exist for supervised data having class information. Little work has been done for dimensionality reduction of unsupervised data in which class information is not available. Principal Component Analysis (PCA) is often used. However, PCA creates new features. It is difficult to obtain intuitive understanding of the data using the new features only. In this paper we are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature from the feature set may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct extensive experiments to show that our method is able to find the important features. Also it compares well with a similar feature ranking method (Relief) that requires class information unlike our method.

UR - http://www.scopus.com/inward/record.url?scp=0031359166&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031359166&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0031359166

SP - 532

EP - 539

BT - Proceedings of the International Conference on Tools with Artificial Intelligence

A2 - Anon, null

PB - IEEE

T2 - Proceedings if the 1997 IEEE 9th IEEE International Conference on Tools with Artificial Intelligence

Y2 - 3 November 1997 through 8 November 1997

ER -

Dimensionality reduction of unsupervised data

Abstract

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this