A Maximum-Likelihood Approach to Estimating the Insertion Frequencies of Transposable Elements from Population Sequencing Data

Xiaoqian Jiang, Haixu Tang, Wazim Mohammed Ismail, Michael Lynch

Research output: Contribution to journalArticle

Abstract

Transposable elements (TEs) contribute to a large fraction of the expansion of many eukaryotic genomes due to the capability of TEs duplicating themselves through transposition. A first step to understanding the roles of TEs in a eukaryotic genome is to characterize the population-wide variation of TE insertions in the species. Here, we present a maximum-likelihood (ML) method for estimating allele frequencies and detecting selection on TE insertions in a diploid population, based on the genotypes at TE insertion sites detected in multiple individuals sampled from the population using paired-end (PE) sequencing reads. Tests of the method on simulated data show that it can accurately estimate the allele frequencies of TE insertions even when the PE sequencing is conducted at a relatively low coverage (=5X). The method can also detect TE insertions under strong selection, and the detection ability increases with sample size in a population, although a substantial fraction of actual TE insertions under selection may be undetected. Application of the ML method to genomic sequencing data collected from a natural Daphnia pulex population shows that, on the one hand, most (>90%) TE insertions present in the reference D. pulex genome are either fixed or nearly fixed (with allele frequencies >0.95); on the other hand, among the nonreference TE insertions (i.e., those detected in some individuals in the population but absent from the reference genome), the majority (>70%) are still at low frequencies (<0.1). Finally, we detected a substantial fraction (∼9%) of nonreference TE insertions under selection.

Original languageEnglish (US)
Pages (from-to)2560-2571
Number of pages12
JournalMolecular Biology and Evolution
Volume35
Issue number10
DOIs
StatePublished - Oct 1 2018

Fingerprint

DNA Transposable Elements
transposons
genome
allele
Population
Gene Frequency
Genome
gene frequency
Daphnia pulex
genomics
genotype
method
Daphnia
transposition (genetics)
Diploidy
methodology
Sample Size
diploidy
population size
Genotype

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics

Cite this

A Maximum-Likelihood Approach to Estimating the Insertion Frequencies of Transposable Elements from Population Sequencing Data. / Jiang, Xiaoqian; Tang, Haixu; Mohammed Ismail, Wazim; Lynch, Michael.

In: Molecular Biology and Evolution, Vol. 35, No. 10, 01.10.2018, p. 2560-2571.

Research output: Contribution to journalArticle

@article{96304284b21545b08ea047a6bd88bf1d,
title = "A Maximum-Likelihood Approach to Estimating the Insertion Frequencies of Transposable Elements from Population Sequencing Data",
abstract = "Transposable elements (TEs) contribute to a large fraction of the expansion of many eukaryotic genomes due to the capability of TEs duplicating themselves through transposition. A first step to understanding the roles of TEs in a eukaryotic genome is to characterize the population-wide variation of TE insertions in the species. Here, we present a maximum-likelihood (ML) method for estimating allele frequencies and detecting selection on TE insertions in a diploid population, based on the genotypes at TE insertion sites detected in multiple individuals sampled from the population using paired-end (PE) sequencing reads. Tests of the method on simulated data show that it can accurately estimate the allele frequencies of TE insertions even when the PE sequencing is conducted at a relatively low coverage (=5X). The method can also detect TE insertions under strong selection, and the detection ability increases with sample size in a population, although a substantial fraction of actual TE insertions under selection may be undetected. Application of the ML method to genomic sequencing data collected from a natural Daphnia pulex population shows that, on the one hand, most (>90{\%}) TE insertions present in the reference D. pulex genome are either fixed or nearly fixed (with allele frequencies >0.95); on the other hand, among the nonreference TE insertions (i.e., those detected in some individuals in the population but absent from the reference genome), the majority (>70{\%}) are still at low frequencies (<0.1). Finally, we detected a substantial fraction (∼9{\%}) of nonreference TE insertions under selection.",
author = "Xiaoqian Jiang and Haixu Tang and {Mohammed Ismail}, Wazim and Michael Lynch",
year = "2018",
month = "10",
day = "1",
doi = "10.1093/molbev/msy152",
language = "English (US)",
volume = "35",
pages = "2560--2571",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "10",

}

TY - JOUR

T1 - A Maximum-Likelihood Approach to Estimating the Insertion Frequencies of Transposable Elements from Population Sequencing Data

AU - Jiang, Xiaoqian

AU - Tang, Haixu

AU - Mohammed Ismail, Wazim

AU - Lynch, Michael

PY - 2018/10/1

Y1 - 2018/10/1

N2 - Transposable elements (TEs) contribute to a large fraction of the expansion of many eukaryotic genomes due to the capability of TEs duplicating themselves through transposition. A first step to understanding the roles of TEs in a eukaryotic genome is to characterize the population-wide variation of TE insertions in the species. Here, we present a maximum-likelihood (ML) method for estimating allele frequencies and detecting selection on TE insertions in a diploid population, based on the genotypes at TE insertion sites detected in multiple individuals sampled from the population using paired-end (PE) sequencing reads. Tests of the method on simulated data show that it can accurately estimate the allele frequencies of TE insertions even when the PE sequencing is conducted at a relatively low coverage (=5X). The method can also detect TE insertions under strong selection, and the detection ability increases with sample size in a population, although a substantial fraction of actual TE insertions under selection may be undetected. Application of the ML method to genomic sequencing data collected from a natural Daphnia pulex population shows that, on the one hand, most (>90%) TE insertions present in the reference D. pulex genome are either fixed or nearly fixed (with allele frequencies >0.95); on the other hand, among the nonreference TE insertions (i.e., those detected in some individuals in the population but absent from the reference genome), the majority (>70%) are still at low frequencies (<0.1). Finally, we detected a substantial fraction (∼9%) of nonreference TE insertions under selection.

AB - Transposable elements (TEs) contribute to a large fraction of the expansion of many eukaryotic genomes due to the capability of TEs duplicating themselves through transposition. A first step to understanding the roles of TEs in a eukaryotic genome is to characterize the population-wide variation of TE insertions in the species. Here, we present a maximum-likelihood (ML) method for estimating allele frequencies and detecting selection on TE insertions in a diploid population, based on the genotypes at TE insertion sites detected in multiple individuals sampled from the population using paired-end (PE) sequencing reads. Tests of the method on simulated data show that it can accurately estimate the allele frequencies of TE insertions even when the PE sequencing is conducted at a relatively low coverage (=5X). The method can also detect TE insertions under strong selection, and the detection ability increases with sample size in a population, although a substantial fraction of actual TE insertions under selection may be undetected. Application of the ML method to genomic sequencing data collected from a natural Daphnia pulex population shows that, on the one hand, most (>90%) TE insertions present in the reference D. pulex genome are either fixed or nearly fixed (with allele frequencies >0.95); on the other hand, among the nonreference TE insertions (i.e., those detected in some individuals in the population but absent from the reference genome), the majority (>70%) are still at low frequencies (<0.1). Finally, we detected a substantial fraction (∼9%) of nonreference TE insertions under selection.

UR - http://www.scopus.com/inward/record.url?scp=85054892567&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054892567&partnerID=8YFLogxK

U2 - 10.1093/molbev/msy152

DO - 10.1093/molbev/msy152

M3 - Article

VL - 35

SP - 2560

EP - 2571

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 10

ER -