The effects of random taxa sampling schemes in Bayesian virus phylogeography

Daniel Magee, Matthew Scotch

Research output: Contribution to journalArticle

Abstract

Public health researchers are often tasked with accurately and quickly identifying the location and time when an epidemic originated from a representative sample of nucleotide sequences. In this paper, we investigate multiple approaches to subsampling the sequence set when employing a Bayesian phylogeographic generalized linear model. Our results indicate that near-categorical posterior MCC estimates on the root can be obtained with replicate runs using 25–50% of the sequence data, and that including 90% of sequences does not necessarily entail more accurate inferences. We present the first analysis of predictor signal suppression and show how the ability to detect the influence of predictor variables is limited when sample size predictors are included in the models.

Original languageEnglish (US)
Pages (from-to)225-230
Number of pages6
JournalInfection, Genetics and Evolution
Volume64
DOIs
StatePublished - Oct 1 2018

Fingerprint

Phylogeography
phylogeography
Sample Size
Linear Models
virus
Public Health
Research Personnel
Viruses
viruses
sampling
public health
researchers
linear models
nucleotide sequences
effect
analysis

Keywords

  • Phylogeography
  • Selection Bias
  • Viruses

ASJC Scopus subject areas

  • Microbiology
  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics
  • Microbiology (medical)
  • Infectious Diseases

Cite this

The effects of random taxa sampling schemes in Bayesian virus phylogeography. / Magee, Daniel; Scotch, Matthew.

In: Infection, Genetics and Evolution, Vol. 64, 01.10.2018, p. 225-230.

Research output: Contribution to journalArticle

@article{eb896252f2d14d0bb411cf42d1f7078a,
title = "The effects of random taxa sampling schemes in Bayesian virus phylogeography",
abstract = "Public health researchers are often tasked with accurately and quickly identifying the location and time when an epidemic originated from a representative sample of nucleotide sequences. In this paper, we investigate multiple approaches to subsampling the sequence set when employing a Bayesian phylogeographic generalized linear model. Our results indicate that near-categorical posterior MCC estimates on the root can be obtained with replicate runs using 25–50{\%} of the sequence data, and that including 90{\%} of sequences does not necessarily entail more accurate inferences. We present the first analysis of predictor signal suppression and show how the ability to detect the influence of predictor variables is limited when sample size predictors are included in the models.",
keywords = "Phylogeography, Selection Bias, Viruses",
author = "Daniel Magee and Matthew Scotch",
year = "2018",
month = "10",
day = "1",
doi = "10.1016/j.meegid.2018.07.003",
language = "English (US)",
volume = "64",
pages = "225--230",
journal = "Infection, Genetics and Evolution",
issn = "1567-1348",
publisher = "Elsevier",

}

TY - JOUR

T1 - The effects of random taxa sampling schemes in Bayesian virus phylogeography

AU - Magee, Daniel

AU - Scotch, Matthew

PY - 2018/10/1

Y1 - 2018/10/1

N2 - Public health researchers are often tasked with accurately and quickly identifying the location and time when an epidemic originated from a representative sample of nucleotide sequences. In this paper, we investigate multiple approaches to subsampling the sequence set when employing a Bayesian phylogeographic generalized linear model. Our results indicate that near-categorical posterior MCC estimates on the root can be obtained with replicate runs using 25–50% of the sequence data, and that including 90% of sequences does not necessarily entail more accurate inferences. We present the first analysis of predictor signal suppression and show how the ability to detect the influence of predictor variables is limited when sample size predictors are included in the models.

AB - Public health researchers are often tasked with accurately and quickly identifying the location and time when an epidemic originated from a representative sample of nucleotide sequences. In this paper, we investigate multiple approaches to subsampling the sequence set when employing a Bayesian phylogeographic generalized linear model. Our results indicate that near-categorical posterior MCC estimates on the root can be obtained with replicate runs using 25–50% of the sequence data, and that including 90% of sequences does not necessarily entail more accurate inferences. We present the first analysis of predictor signal suppression and show how the ability to detect the influence of predictor variables is limited when sample size predictors are included in the models.

KW - Phylogeography

KW - Selection Bias

KW - Viruses

UR - http://www.scopus.com/inward/record.url?scp=85049478540&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049478540&partnerID=8YFLogxK

U2 - 10.1016/j.meegid.2018.07.003

DO - 10.1016/j.meegid.2018.07.003

M3 - Article

C2 - 29991455

AN - SCOPUS:85049478540

VL - 64

SP - 225

EP - 230

JO - Infection, Genetics and Evolution

JF - Infection, Genetics and Evolution

SN - 1567-1348

ER -