Using uncorrelated discriminant analysis for tissue classification with gene expression data

Jieping Ye, Tao Li, Tao Xiong, Ravi Janardan

Research output: Contribution to journalArticle

140 Citations (Scopus)

Abstract

The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes Is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundreds); that is, the data dimension is large compared to the number of data points (such data is said to be undersampled). To cope with performance and accuracy problems associated with high dimensionality, it is commonplace to apply a preprocessing step that transforms the data to a space of significantly lower dimension with limited loss of the information present in the original data. Linear Discriminant Analysis (LDA) is a well-known technique for dimension reduction and feature extraction, but it is not applicable for undersampled data due to singularity problems associated with the matrices in the underlying representation. This paper presents a dimension reduction and feature extraction scheme, called Uncorrelated Linear Discriminant Analysis (ULDA), for undersampled problems and illustrates its utility on gene expression data. ULDA employs the Generalized Singular Value Decomposition method to handle undersampled data and the features that it produces in the transformed space are uncorrelated, which makes it attractive for gene expression data. The properties of ULDA are established rigorously and extensive experimental results on gene expression data are presented to illustrate its effectiveness in classifying tissue samples. These results provide a comparative study of various state-of-the-art classification methods on well-known gene expression data sets.

Original languageEnglish (US)
Pages (from-to)181-190
Number of pages10
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume1
Issue number4
DOIs
StatePublished - Oct 2004
Externally publishedYes

Fingerprint

Discriminant Analysis
Discriminant analysis
Gene Expression Data
Gene expression
discriminant analysis
Tissue
Gene Expression
gene expression
Feature extraction
Dimension Reduction
Feature Extraction
Generalized Singular Value Decomposition
disease diagnosis
Singular value decomposition
sampling
tissues
Genes
Decomposition Method
methodology
Comparative Study

Keywords

  • Classification
  • Discriminant analysis
  • Generalized singular value decomposition
  • Microarray data analysis

ASJC Scopus subject areas

  • Engineering(all)
  • Agricultural and Biological Sciences (miscellaneous)

Cite this

Using uncorrelated discriminant analysis for tissue classification with gene expression data. / Ye, Jieping; Li, Tao; Xiong, Tao; Janardan, Ravi.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 1, No. 4, 10.2004, p. 181-190.

Research output: Contribution to journalArticle

@article{dabbe0a65a41497ebc0c51f217be8ba9,
title = "Using uncorrelated discriminant analysis for tissue classification with gene expression data",
abstract = "The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes Is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundreds); that is, the data dimension is large compared to the number of data points (such data is said to be undersampled). To cope with performance and accuracy problems associated with high dimensionality, it is commonplace to apply a preprocessing step that transforms the data to a space of significantly lower dimension with limited loss of the information present in the original data. Linear Discriminant Analysis (LDA) is a well-known technique for dimension reduction and feature extraction, but it is not applicable for undersampled data due to singularity problems associated with the matrices in the underlying representation. This paper presents a dimension reduction and feature extraction scheme, called Uncorrelated Linear Discriminant Analysis (ULDA), for undersampled problems and illustrates its utility on gene expression data. ULDA employs the Generalized Singular Value Decomposition method to handle undersampled data and the features that it produces in the transformed space are uncorrelated, which makes it attractive for gene expression data. The properties of ULDA are established rigorously and extensive experimental results on gene expression data are presented to illustrate its effectiveness in classifying tissue samples. These results provide a comparative study of various state-of-the-art classification methods on well-known gene expression data sets.",
keywords = "Classification, Discriminant analysis, Generalized singular value decomposition, Microarray data analysis",
author = "Jieping Ye and Tao Li and Tao Xiong and Ravi Janardan",
year = "2004",
month = "10",
doi = "10.1109/TCBB.2004.45",
language = "English (US)",
volume = "1",
pages = "181--190",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

TY - JOUR

T1 - Using uncorrelated discriminant analysis for tissue classification with gene expression data

AU - Ye, Jieping

AU - Li, Tao

AU - Xiong, Tao

AU - Janardan, Ravi

PY - 2004/10

Y1 - 2004/10

N2 - The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes Is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundreds); that is, the data dimension is large compared to the number of data points (such data is said to be undersampled). To cope with performance and accuracy problems associated with high dimensionality, it is commonplace to apply a preprocessing step that transforms the data to a space of significantly lower dimension with limited loss of the information present in the original data. Linear Discriminant Analysis (LDA) is a well-known technique for dimension reduction and feature extraction, but it is not applicable for undersampled data due to singularity problems associated with the matrices in the underlying representation. This paper presents a dimension reduction and feature extraction scheme, called Uncorrelated Linear Discriminant Analysis (ULDA), for undersampled problems and illustrates its utility on gene expression data. ULDA employs the Generalized Singular Value Decomposition method to handle undersampled data and the features that it produces in the transformed space are uncorrelated, which makes it attractive for gene expression data. The properties of ULDA are established rigorously and extensive experimental results on gene expression data are presented to illustrate its effectiveness in classifying tissue samples. These results provide a comparative study of various state-of-the-art classification methods on well-known gene expression data sets.

AB - The classification of tissue samples based on gene expression data is an important problem in medical diagnosis of diseases such as cancer. In gene expression data, the number of genes Is usually very high (in the thousands) compared to the number of data samples (in the tens or low hundreds); that is, the data dimension is large compared to the number of data points (such data is said to be undersampled). To cope with performance and accuracy problems associated with high dimensionality, it is commonplace to apply a preprocessing step that transforms the data to a space of significantly lower dimension with limited loss of the information present in the original data. Linear Discriminant Analysis (LDA) is a well-known technique for dimension reduction and feature extraction, but it is not applicable for undersampled data due to singularity problems associated with the matrices in the underlying representation. This paper presents a dimension reduction and feature extraction scheme, called Uncorrelated Linear Discriminant Analysis (ULDA), for undersampled problems and illustrates its utility on gene expression data. ULDA employs the Generalized Singular Value Decomposition method to handle undersampled data and the features that it produces in the transformed space are uncorrelated, which makes it attractive for gene expression data. The properties of ULDA are established rigorously and extensive experimental results on gene expression data are presented to illustrate its effectiveness in classifying tissue samples. These results provide a comparative study of various state-of-the-art classification methods on well-known gene expression data sets.

KW - Classification

KW - Discriminant analysis

KW - Generalized singular value decomposition

KW - Microarray data analysis

UR - http://www.scopus.com/inward/record.url?scp=14744274588&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14744274588&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2004.45

DO - 10.1109/TCBB.2004.45

M3 - Article

VL - 1

SP - 181

EP - 190

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 4

ER -