A composite genome approach to identify phylogenetically informative data from next-generation sequencing

Rachel S. Schwartz, Kelly M. Harkins, Anne C. Stone, Reed A. Cartwright

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Background: Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. Results: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. Conclusions: SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly.

Original languageEnglish (US)
Article number193
JournalBMC Bioinformatics
Volume16
Issue number1
DOIs
StatePublished - Jun 11 2015

Fingerprint

Phylogenetics
Sequencing
Mammals
Genome
Phylogeny
Genes
Composite
Composite materials
Genomics
Hominidae
Firearms
Missing Data
Software
Large Data Sets
Alternate
Annotation
Locus
Technology
Alignment
Eliminate

Keywords

  • Apes
  • Mammals
  • Next-generation sequencing
  • Phylogenetics

ASJC Scopus subject areas

  • Applied Mathematics
  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications

Cite this

A composite genome approach to identify phylogenetically informative data from next-generation sequencing. / Schwartz, Rachel S.; Harkins, Kelly M.; Stone, Anne C.; Cartwright, Reed A.

In: BMC Bioinformatics, Vol. 16, No. 1, 193, 11.06.2015.

Research output: Contribution to journalArticle

@article{bb893b0cdd1447418ef540d20cb4e24d,
title = "A composite genome approach to identify phylogenetically informative data from next-generation sequencing",
abstract = "Background: Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. Results: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. Conclusions: SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly.",
keywords = "Apes, Mammals, Next-generation sequencing, Phylogenetics",
author = "Schwartz, {Rachel S.} and Harkins, {Kelly M.} and Stone, {Anne C.} and Cartwright, {Reed A.}",
year = "2015",
month = "6",
day = "11",
doi = "10.1186/s12859-015-0632-y",
language = "English (US)",
volume = "16",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - A composite genome approach to identify phylogenetically informative data from next-generation sequencing

AU - Schwartz, Rachel S.

AU - Harkins, Kelly M.

AU - Stone, Anne C.

AU - Cartwright, Reed A.

PY - 2015/6/11

Y1 - 2015/6/11

N2 - Background: Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. Results: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. Conclusions: SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly.

AB - Background: Improvements in sequencing technology now allow easy acquisition of large datasets; however, analyzing these data for phylogenetics can be challenging. We have developed a novel method to rapidly obtain homologous genomic data for phylogenetics directly from next-generation sequencing reads without the use of a reference genome. This software, called SISRS, avoids the time consuming steps of de novo whole genome assembly, multiple genome alignment, and annotation. Results: For simulations SISRS is able to identify large numbers of loci containing variable sites with phylogenetic signal. For genomic data from apes, SISRS identified thousands of variable sites, from which we produced an accurate phylogeny. Finally, we used SISRS to identify phylogenetic markers that we used to estimate the phylogeny of placental mammals. We recovered eight phylogenies that resolved the basal relationships among mammals using datasets with different levels of missing data. The three alternate resolutions of the basal relationships are consistent with the major hypotheses for the relationships among mammals, all of which have been supported previously by different molecular datasets. Conclusions: SISRS has the potential to transform phylogenetic research. This method eliminates the need for expensive marker development in many studies by using whole genome shotgun sequence data directly.

KW - Apes

KW - Mammals

KW - Next-generation sequencing

KW - Phylogenetics

UR - http://www.scopus.com/inward/record.url?scp=84934907977&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84934907977&partnerID=8YFLogxK

U2 - 10.1186/s12859-015-0632-y

DO - 10.1186/s12859-015-0632-y

M3 - Article

C2 - 26062548

AN - SCOPUS:84934907977

VL - 16

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 193

ER -