Multiple sequence alignment: In pursuit of homologous DNA positions

Sudhir Kumar; Alan Filipski

doi:10.1101/gr.5232407

Multiple sequence alignment: In pursuit of homologous DNA positions

Sudhir Kumar, Alan Filipski

Life Sciences, School of (SOLS)

Research output: Contribution to journal › Review article › peer-review

101 Scopus citations

Abstract

DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.

Original language	English (US)
Pages (from-to)	127-135
Number of pages	9
Journal	Genome research
Volume	17
Issue number	2
DOIs	https://doi.org/10.1101/gr.5232407
State	Published - Feb 2007

ASJC Scopus subject areas

Genetics
Genetics(clinical)

Access to Document

10.1101/gr.5232407

Cite this

@article{6701feb0622b46ca828d6cf6e68810ec,

title = "Multiple sequence alignment: In pursuit of homologous DNA positions",

abstract = "DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.",

author = "Sudhir Kumar and Alan Filipski",

year = "2007",

month = feb,

doi = "10.1101/gr.5232407",

language = "English (US)",

volume = "17",

pages = "127--135",

journal = "Genome research",

issn = "1088-9051",

publisher = "Cold Spring Harbor Laboratory Press",

number = "2",

}

TY - JOUR

T1 - Multiple sequence alignment

T2 - In pursuit of homologous DNA positions

AU - Kumar, Sudhir

AU - Filipski, Alan

PY - 2007/2

Y1 - 2007/2

N2 - DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.

AB - DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.

UR - http://www.scopus.com/inward/record.url?scp=33846869288&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846869288&partnerID=8YFLogxK

U2 - 10.1101/gr.5232407

DO - 10.1101/gr.5232407

M3 - Review article

C2 - 17272647

AN - SCOPUS:33846869288

SN - 1088-9051

VL - 17

SP - 127

EP - 135

JO - Genome research

JF - Genome research

IS - 2

ER -

Multiple sequence alignment: In pursuit of homologous DNA positions

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this