A statistical guide to the design of deep mutational scanning experiments

Sebastian Matuszewski, Marcel E. Hildebrandt, Ana Hermina Ghenu, Jeffrey Jensen, Claudia Bank

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.

Original languageEnglish (US)
Pages (from-to)77-87
Number of pages11
JournalGenetics
Volume204
Issue number1
DOIs
StatePublished - Sep 1 2016
Externally publishedYes

Fingerprint

High-Throughput Nucleotide Sequencing
Mutation
Cluster Analysis
Guidelines
Confidence Intervals
Technology

Keywords

  • Distribution of fitness effects
  • Experimental design
  • Experimental evolution
  • Mutation
  • Population genetics

ASJC Scopus subject areas

  • Genetics

Cite this

A statistical guide to the design of deep mutational scanning experiments. / Matuszewski, Sebastian; Hildebrandt, Marcel E.; Ghenu, Ana Hermina; Jensen, Jeffrey; Bank, Claudia.

In: Genetics, Vol. 204, No. 1, 01.09.2016, p. 77-87.

Research output: Contribution to journalArticle

Matuszewski, S, Hildebrandt, ME, Ghenu, AH, Jensen, J & Bank, C 2016, 'A statistical guide to the design of deep mutational scanning experiments', Genetics, vol. 204, no. 1, pp. 77-87. https://doi.org/10.1534/genetics.116.190462
Matuszewski, Sebastian ; Hildebrandt, Marcel E. ; Ghenu, Ana Hermina ; Jensen, Jeffrey ; Bank, Claudia. / A statistical guide to the design of deep mutational scanning experiments. In: Genetics. 2016 ; Vol. 204, No. 1. pp. 77-87.
@article{534e9602cb664ade8c70b17f4bf6bd57,
title = "A statistical guide to the design of deep mutational scanning experiments",
abstract = "The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95{\%}-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.",
keywords = "Distribution of fitness effects, Experimental design, Experimental evolution, Mutation, Population genetics",
author = "Sebastian Matuszewski and Hildebrandt, {Marcel E.} and Ghenu, {Ana Hermina} and Jeffrey Jensen and Claudia Bank",
year = "2016",
month = "9",
day = "1",
doi = "10.1534/genetics.116.190462",
language = "English (US)",
volume = "204",
pages = "77--87",
journal = "Genetics",
issn = "0016-6731",
publisher = "Genetics Society of America",
number = "1",

}

TY - JOUR

T1 - A statistical guide to the design of deep mutational scanning experiments

AU - Matuszewski, Sebastian

AU - Hildebrandt, Marcel E.

AU - Ghenu, Ana Hermina

AU - Jensen, Jeffrey

AU - Bank, Claudia

PY - 2016/9/1

Y1 - 2016/9/1

N2 - The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.

AB - The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.

KW - Distribution of fitness effects

KW - Experimental design

KW - Experimental evolution

KW - Mutation

KW - Population genetics

UR - http://www.scopus.com/inward/record.url?scp=84986254038&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84986254038&partnerID=8YFLogxK

U2 - 10.1534/genetics.116.190462

DO - 10.1534/genetics.116.190462

M3 - Article

C2 - 27412710

AN - SCOPUS:84986254038

VL - 204

SP - 77

EP - 87

JO - Genetics

JF - Genetics

SN - 0016-6731

IS - 1

ER -