The effect of SNP discovery method and sample size on estimation of population genetic data for Chinese and Indian rhesus macaques (Macaca mulatta)

Jessica A Satkoski Trask, Ripan S. Malhi, Sreetharan Kanthaswamy, Jesse Johnson, Wendy T. Garnica, Venkat S. Malladi, David Glenn Smith

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

This study was designed to address issues regarding sample size and marker location that have arisen from the discovery of SNPs in the genomes of poorly characterized primate species and the application of these markers to the study of primate population genetics. We predict the effect of discovery sample size on the probability of discovering both rare and common SNPs and then compare this prediction with the proportion of common and rare SNPs discovered when different numbers of individuals are sequenced. Second, we examine the effect of genomic region on estimates of common population genetic data, comparing markers from both coding and non-coding regions of the rhesus macaque genome and the population genetic data calculated from these markers, to measure the degree and direction of bias introduced by SNPs located in coding versus non-coding regions of the genome. We found that both discovery sample size and genomic region surveyed affect SNP marker attributes and population genetic estimates, even when these are calculated from an expanded data set containing more individuals than the original discovery data set. Although none of the SNP detection methods or genomic regions tested in this study was completely uninformative, these results show that each has a different kind of genetic variation that is suitable for different purposes, and each introduces specific types of bias. Given that each SNP marker has an individual evolutionary history, we calculated that the most complete and unbiased representation of the genetic diversity present in the individual can be obtained by incorporating at least 10 individuals into the discovery sample set, to ensure the discovery of both common and rare polymorphisms.

Original languageEnglish (US)
Pages (from-to)129-138
Number of pages10
JournalPrimates
Volume52
Issue number2
DOIs
StatePublished - Apr 2011
Externally publishedYes

Fingerprint

Macaca mulatta
population genetics
genomics
genome
Primates
sampling
genetic variation
methodology
genetic polymorphism
history
prediction

Keywords

  • Macaca mulatta
  • Population genetics
  • SNP discovery

ASJC Scopus subject areas

  • Animal Science and Zoology

Cite this

The effect of SNP discovery method and sample size on estimation of population genetic data for Chinese and Indian rhesus macaques (Macaca mulatta). / Trask, Jessica A Satkoski; Malhi, Ripan S.; Kanthaswamy, Sreetharan; Johnson, Jesse; Garnica, Wendy T.; Malladi, Venkat S.; Smith, David Glenn.

In: Primates, Vol. 52, No. 2, 04.2011, p. 129-138.

Research output: Contribution to journalArticle

Trask, Jessica A Satkoski ; Malhi, Ripan S. ; Kanthaswamy, Sreetharan ; Johnson, Jesse ; Garnica, Wendy T. ; Malladi, Venkat S. ; Smith, David Glenn. / The effect of SNP discovery method and sample size on estimation of population genetic data for Chinese and Indian rhesus macaques (Macaca mulatta). In: Primates. 2011 ; Vol. 52, No. 2. pp. 129-138.
@article{5cd9c2ab4e774e76b6a0c3e43dd0cd81,
title = "The effect of SNP discovery method and sample size on estimation of population genetic data for Chinese and Indian rhesus macaques (Macaca mulatta)",
abstract = "This study was designed to address issues regarding sample size and marker location that have arisen from the discovery of SNPs in the genomes of poorly characterized primate species and the application of these markers to the study of primate population genetics. We predict the effect of discovery sample size on the probability of discovering both rare and common SNPs and then compare this prediction with the proportion of common and rare SNPs discovered when different numbers of individuals are sequenced. Second, we examine the effect of genomic region on estimates of common population genetic data, comparing markers from both coding and non-coding regions of the rhesus macaque genome and the population genetic data calculated from these markers, to measure the degree and direction of bias introduced by SNPs located in coding versus non-coding regions of the genome. We found that both discovery sample size and genomic region surveyed affect SNP marker attributes and population genetic estimates, even when these are calculated from an expanded data set containing more individuals than the original discovery data set. Although none of the SNP detection methods or genomic regions tested in this study was completely uninformative, these results show that each has a different kind of genetic variation that is suitable for different purposes, and each introduces specific types of bias. Given that each SNP marker has an individual evolutionary history, we calculated that the most complete and unbiased representation of the genetic diversity present in the individual can be obtained by incorporating at least 10 individuals into the discovery sample set, to ensure the discovery of both common and rare polymorphisms.",
keywords = "Macaca mulatta, Population genetics, SNP discovery",
author = "Trask, {Jessica A Satkoski} and Malhi, {Ripan S.} and Sreetharan Kanthaswamy and Jesse Johnson and Garnica, {Wendy T.} and Malladi, {Venkat S.} and Smith, {David Glenn}",
year = "2011",
month = "4",
doi = "10.1007/s10329-010-0232-4",
language = "English (US)",
volume = "52",
pages = "129--138",
journal = "Primates",
issn = "0032-8332",
publisher = "Springer Japan",
number = "2",

}

TY - JOUR

T1 - The effect of SNP discovery method and sample size on estimation of population genetic data for Chinese and Indian rhesus macaques (Macaca mulatta)

AU - Trask, Jessica A Satkoski

AU - Malhi, Ripan S.

AU - Kanthaswamy, Sreetharan

AU - Johnson, Jesse

AU - Garnica, Wendy T.

AU - Malladi, Venkat S.

AU - Smith, David Glenn

PY - 2011/4

Y1 - 2011/4

N2 - This study was designed to address issues regarding sample size and marker location that have arisen from the discovery of SNPs in the genomes of poorly characterized primate species and the application of these markers to the study of primate population genetics. We predict the effect of discovery sample size on the probability of discovering both rare and common SNPs and then compare this prediction with the proportion of common and rare SNPs discovered when different numbers of individuals are sequenced. Second, we examine the effect of genomic region on estimates of common population genetic data, comparing markers from both coding and non-coding regions of the rhesus macaque genome and the population genetic data calculated from these markers, to measure the degree and direction of bias introduced by SNPs located in coding versus non-coding regions of the genome. We found that both discovery sample size and genomic region surveyed affect SNP marker attributes and population genetic estimates, even when these are calculated from an expanded data set containing more individuals than the original discovery data set. Although none of the SNP detection methods or genomic regions tested in this study was completely uninformative, these results show that each has a different kind of genetic variation that is suitable for different purposes, and each introduces specific types of bias. Given that each SNP marker has an individual evolutionary history, we calculated that the most complete and unbiased representation of the genetic diversity present in the individual can be obtained by incorporating at least 10 individuals into the discovery sample set, to ensure the discovery of both common and rare polymorphisms.

AB - This study was designed to address issues regarding sample size and marker location that have arisen from the discovery of SNPs in the genomes of poorly characterized primate species and the application of these markers to the study of primate population genetics. We predict the effect of discovery sample size on the probability of discovering both rare and common SNPs and then compare this prediction with the proportion of common and rare SNPs discovered when different numbers of individuals are sequenced. Second, we examine the effect of genomic region on estimates of common population genetic data, comparing markers from both coding and non-coding regions of the rhesus macaque genome and the population genetic data calculated from these markers, to measure the degree and direction of bias introduced by SNPs located in coding versus non-coding regions of the genome. We found that both discovery sample size and genomic region surveyed affect SNP marker attributes and population genetic estimates, even when these are calculated from an expanded data set containing more individuals than the original discovery data set. Although none of the SNP detection methods or genomic regions tested in this study was completely uninformative, these results show that each has a different kind of genetic variation that is suitable for different purposes, and each introduces specific types of bias. Given that each SNP marker has an individual evolutionary history, we calculated that the most complete and unbiased representation of the genetic diversity present in the individual can be obtained by incorporating at least 10 individuals into the discovery sample set, to ensure the discovery of both common and rare polymorphisms.

KW - Macaca mulatta

KW - Population genetics

KW - SNP discovery

UR - http://www.scopus.com/inward/record.url?scp=79953244620&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79953244620&partnerID=8YFLogxK

U2 - 10.1007/s10329-010-0232-4

DO - 10.1007/s10329-010-0232-4

M3 - Article

C2 - 21207104

AN - SCOPUS:79953244620

VL - 52

SP - 129

EP - 138

JO - Primates

JF - Primates

SN - 0032-8332

IS - 2

ER -