Integrating Markov clustering and molecular phylogenetics to reconstruct the cyanobacterial species tree from conserved protein families

Wesley D. Swingley, Robert E. Blankenship, Jason Raymond

Research output: Contribution to journalArticle

51 Citations (Scopus)

Abstract

Attempts to classify living organisms by their physical characteristics are as old as biology itself. The advent of protein and DNA sequencing - most notably the use of 16S ribosomal RNA - defined a new level of classification that now forms our basic understanding of the history of life on earth. High-throughput sequencing currently provides DNA sequences at an unprecedented rate, not only providing a wealth of information but also posing considerable analytical challenges. Here we present comparative genomics-based methods useful for automating evolutionary analysis between any number of species. As a practical example, we applied our method to the well-studied cyanobacterial lineage. The 24 cyanobacterial genomes compared here occupy a wide variety of environmental niches and play major roles in global carbon and nitrogen cycles. By integrating phylogenetic data inferred for upward of 1,000 protein-coding genes common to all or most cyanobacteria, we have reconstructed an evolutionary history of the phylum, establishing a framework for resolving key issues regarding the evolution of their metabolic and phenotypic diversity. Greater resolution on individual branches can be attained by telescoping inward to the larger set of conserved proteins between fewer taxa. The construction of all individual protein phylogenies allows for quantitative tree scoring, providing insight into the evolutionary history of each protein family as well as probing the limits of phylogenetic resolution. The tools incorporated here are fast, computationally tractable, and easily extendable to other phyla and provide a scaleable framework for contrasting and integrating the information present in thousands of protein-coding genes within related genomes.

Original languageEnglish (US)
Pages (from-to)643-654
Number of pages12
JournalMolecular Biology and Evolution
Volume25
Issue number4
DOIs
StatePublished - Apr 2008
Externally publishedYes

Fingerprint

Cluster Analysis
phylogenetics
protein
phylogeny
Genes
Proteins
proteins
Nitrogen Cycle
History
Genome
16S Ribosomal RNA
Telescopes
Carbon Cycle
genome
history
Protein Sequence Analysis
Cyanobacteria
DNA
Phylogeny
Genomics

Keywords

  • Cyanobacteria
  • Evolution
  • Genomics
  • Markov clustering
  • Phylogenomics

ASJC Scopus subject areas

  • Genetics
  • Biochemistry
  • Genetics(clinical)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Ecology, Evolution, Behavior and Systematics
  • Agricultural and Biological Sciences (miscellaneous)
  • Molecular Biology

Cite this

Integrating Markov clustering and molecular phylogenetics to reconstruct the cyanobacterial species tree from conserved protein families. / Swingley, Wesley D.; Blankenship, Robert E.; Raymond, Jason.

In: Molecular Biology and Evolution, Vol. 25, No. 4, 04.2008, p. 643-654.

Research output: Contribution to journalArticle

@article{1a0d1e9109054f29a212c8e8eca619b9,
title = "Integrating Markov clustering and molecular phylogenetics to reconstruct the cyanobacterial species tree from conserved protein families",
abstract = "Attempts to classify living organisms by their physical characteristics are as old as biology itself. The advent of protein and DNA sequencing - most notably the use of 16S ribosomal RNA - defined a new level of classification that now forms our basic understanding of the history of life on earth. High-throughput sequencing currently provides DNA sequences at an unprecedented rate, not only providing a wealth of information but also posing considerable analytical challenges. Here we present comparative genomics-based methods useful for automating evolutionary analysis between any number of species. As a practical example, we applied our method to the well-studied cyanobacterial lineage. The 24 cyanobacterial genomes compared here occupy a wide variety of environmental niches and play major roles in global carbon and nitrogen cycles. By integrating phylogenetic data inferred for upward of 1,000 protein-coding genes common to all or most cyanobacteria, we have reconstructed an evolutionary history of the phylum, establishing a framework for resolving key issues regarding the evolution of their metabolic and phenotypic diversity. Greater resolution on individual branches can be attained by telescoping inward to the larger set of conserved proteins between fewer taxa. The construction of all individual protein phylogenies allows for quantitative tree scoring, providing insight into the evolutionary history of each protein family as well as probing the limits of phylogenetic resolution. The tools incorporated here are fast, computationally tractable, and easily extendable to other phyla and provide a scaleable framework for contrasting and integrating the information present in thousands of protein-coding genes within related genomes.",
keywords = "Cyanobacteria, Evolution, Genomics, Markov clustering, Phylogenomics",
author = "Swingley, {Wesley D.} and Blankenship, {Robert E.} and Jason Raymond",
year = "2008",
month = "4",
doi = "10.1093/molbev/msn034",
language = "English (US)",
volume = "25",
pages = "643--654",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - Integrating Markov clustering and molecular phylogenetics to reconstruct the cyanobacterial species tree from conserved protein families

AU - Swingley, Wesley D.

AU - Blankenship, Robert E.

AU - Raymond, Jason

PY - 2008/4

Y1 - 2008/4

N2 - Attempts to classify living organisms by their physical characteristics are as old as biology itself. The advent of protein and DNA sequencing - most notably the use of 16S ribosomal RNA - defined a new level of classification that now forms our basic understanding of the history of life on earth. High-throughput sequencing currently provides DNA sequences at an unprecedented rate, not only providing a wealth of information but also posing considerable analytical challenges. Here we present comparative genomics-based methods useful for automating evolutionary analysis between any number of species. As a practical example, we applied our method to the well-studied cyanobacterial lineage. The 24 cyanobacterial genomes compared here occupy a wide variety of environmental niches and play major roles in global carbon and nitrogen cycles. By integrating phylogenetic data inferred for upward of 1,000 protein-coding genes common to all or most cyanobacteria, we have reconstructed an evolutionary history of the phylum, establishing a framework for resolving key issues regarding the evolution of their metabolic and phenotypic diversity. Greater resolution on individual branches can be attained by telescoping inward to the larger set of conserved proteins between fewer taxa. The construction of all individual protein phylogenies allows for quantitative tree scoring, providing insight into the evolutionary history of each protein family as well as probing the limits of phylogenetic resolution. The tools incorporated here are fast, computationally tractable, and easily extendable to other phyla and provide a scaleable framework for contrasting and integrating the information present in thousands of protein-coding genes within related genomes.

AB - Attempts to classify living organisms by their physical characteristics are as old as biology itself. The advent of protein and DNA sequencing - most notably the use of 16S ribosomal RNA - defined a new level of classification that now forms our basic understanding of the history of life on earth. High-throughput sequencing currently provides DNA sequences at an unprecedented rate, not only providing a wealth of information but also posing considerable analytical challenges. Here we present comparative genomics-based methods useful for automating evolutionary analysis between any number of species. As a practical example, we applied our method to the well-studied cyanobacterial lineage. The 24 cyanobacterial genomes compared here occupy a wide variety of environmental niches and play major roles in global carbon and nitrogen cycles. By integrating phylogenetic data inferred for upward of 1,000 protein-coding genes common to all or most cyanobacteria, we have reconstructed an evolutionary history of the phylum, establishing a framework for resolving key issues regarding the evolution of their metabolic and phenotypic diversity. Greater resolution on individual branches can be attained by telescoping inward to the larger set of conserved proteins between fewer taxa. The construction of all individual protein phylogenies allows for quantitative tree scoring, providing insight into the evolutionary history of each protein family as well as probing the limits of phylogenetic resolution. The tools incorporated here are fast, computationally tractable, and easily extendable to other phyla and provide a scaleable framework for contrasting and integrating the information present in thousands of protein-coding genes within related genomes.

KW - Cyanobacteria

KW - Evolution

KW - Genomics

KW - Markov clustering

KW - Phylogenomics

UR - http://www.scopus.com/inward/record.url?scp=40849094319&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=40849094319&partnerID=8YFLogxK

U2 - 10.1093/molbev/msn034

DO - 10.1093/molbev/msn034

M3 - Article

C2 - 18296704

AN - SCOPUS:40849094319

VL - 25

SP - 643

EP - 654

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 4

ER -