Reliable identification of large numbers of candidate SNPs from public EST data

Kenneth Buetow, Michael N. Edmonson, Anna B. Cassidy

Research output: Contribution to journalArticle

215 Citations (Scopus)

Abstract

High-resolution genetic analysis of the human genome promises to provide insight into common disease susceptibility. To perform such analysis will require a collection of high-throughput, high-density analysis reagents. We have developed a polymorphism detection system that uses public-domain sequence data. This detection system is called the single nucleotide polymorphism pipeline (SNPpipeline). The analytic core of the SNPpipeline is composed of three components: PHRED, PHRAP and DEMIGLACE. PHRED and PHRAP are components of a sequence analysis suite developed to perform the semi- automated analysis required for large-scale genomes (provided courtesy of P. Green). Using these informatics tools, which examine redundant raw expressed sequence tag (EST) data, we have identified more than 3,000 candidate single- nucleotide polymorphisms (SNPs). Empiric validation studies of a set of 192 candidates indicate that 82% identify variation in a sample of ten Centre d'Etudes Polymorphism Humain (CEPH) individuals. Our results suggest that existing sequence resources may serve as a valuable source for identifying genetic variation.

Original languageEnglish (US)
Pages (from-to)323-325
Number of pages3
JournalNature Genetics
Volume21
Issue number3
DOIs
StatePublished - Mar 1999
Externally publishedYes

Fingerprint

Expressed Sequence Tags
Single Nucleotide Polymorphism
Informatics
Validation Studies
Public Sector
Disease Susceptibility
Human Genome
Sequence Analysis
Genome

ASJC Scopus subject areas

  • Genetics(clinical)
  • Genetics

Cite this

Reliable identification of large numbers of candidate SNPs from public EST data. / Buetow, Kenneth; Edmonson, Michael N.; Cassidy, Anna B.

In: Nature Genetics, Vol. 21, No. 3, 03.1999, p. 323-325.

Research output: Contribution to journalArticle

Buetow, Kenneth ; Edmonson, Michael N. ; Cassidy, Anna B. / Reliable identification of large numbers of candidate SNPs from public EST data. In: Nature Genetics. 1999 ; Vol. 21, No. 3. pp. 323-325.
@article{88fb8f704be04967a23f7b661302d041,
title = "Reliable identification of large numbers of candidate SNPs from public EST data",
abstract = "High-resolution genetic analysis of the human genome promises to provide insight into common disease susceptibility. To perform such analysis will require a collection of high-throughput, high-density analysis reagents. We have developed a polymorphism detection system that uses public-domain sequence data. This detection system is called the single nucleotide polymorphism pipeline (SNPpipeline). The analytic core of the SNPpipeline is composed of three components: PHRED, PHRAP and DEMIGLACE. PHRED and PHRAP are components of a sequence analysis suite developed to perform the semi- automated analysis required for large-scale genomes (provided courtesy of P. Green). Using these informatics tools, which examine redundant raw expressed sequence tag (EST) data, we have identified more than 3,000 candidate single- nucleotide polymorphisms (SNPs). Empiric validation studies of a set of 192 candidates indicate that 82{\%} identify variation in a sample of ten Centre d'Etudes Polymorphism Humain (CEPH) individuals. Our results suggest that existing sequence resources may serve as a valuable source for identifying genetic variation.",
author = "Kenneth Buetow and Edmonson, {Michael N.} and Cassidy, {Anna B.}",
year = "1999",
month = "3",
doi = "10.1038/6851",
language = "English (US)",
volume = "21",
pages = "323--325",
journal = "Nature Genetics",
issn = "1061-4036",
publisher = "Nature Publishing Group",
number = "3",

}

TY - JOUR

T1 - Reliable identification of large numbers of candidate SNPs from public EST data

AU - Buetow, Kenneth

AU - Edmonson, Michael N.

AU - Cassidy, Anna B.

PY - 1999/3

Y1 - 1999/3

N2 - High-resolution genetic analysis of the human genome promises to provide insight into common disease susceptibility. To perform such analysis will require a collection of high-throughput, high-density analysis reagents. We have developed a polymorphism detection system that uses public-domain sequence data. This detection system is called the single nucleotide polymorphism pipeline (SNPpipeline). The analytic core of the SNPpipeline is composed of three components: PHRED, PHRAP and DEMIGLACE. PHRED and PHRAP are components of a sequence analysis suite developed to perform the semi- automated analysis required for large-scale genomes (provided courtesy of P. Green). Using these informatics tools, which examine redundant raw expressed sequence tag (EST) data, we have identified more than 3,000 candidate single- nucleotide polymorphisms (SNPs). Empiric validation studies of a set of 192 candidates indicate that 82% identify variation in a sample of ten Centre d'Etudes Polymorphism Humain (CEPH) individuals. Our results suggest that existing sequence resources may serve as a valuable source for identifying genetic variation.

AB - High-resolution genetic analysis of the human genome promises to provide insight into common disease susceptibility. To perform such analysis will require a collection of high-throughput, high-density analysis reagents. We have developed a polymorphism detection system that uses public-domain sequence data. This detection system is called the single nucleotide polymorphism pipeline (SNPpipeline). The analytic core of the SNPpipeline is composed of three components: PHRED, PHRAP and DEMIGLACE. PHRED and PHRAP are components of a sequence analysis suite developed to perform the semi- automated analysis required for large-scale genomes (provided courtesy of P. Green). Using these informatics tools, which examine redundant raw expressed sequence tag (EST) data, we have identified more than 3,000 candidate single- nucleotide polymorphisms (SNPs). Empiric validation studies of a set of 192 candidates indicate that 82% identify variation in a sample of ten Centre d'Etudes Polymorphism Humain (CEPH) individuals. Our results suggest that existing sequence resources may serve as a valuable source for identifying genetic variation.

UR - http://www.scopus.com/inward/record.url?scp=0033018816&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033018816&partnerID=8YFLogxK

U2 - 10.1038/6851

DO - 10.1038/6851

M3 - Article

C2 - 10080189

AN - SCOPUS:0033018816

VL - 21

SP - 323

EP - 325

JO - Nature Genetics

JF - Nature Genetics

SN - 1061-4036

IS - 3

ER -