A statistical guide to the design of deep mutational scanning experiments

Sebastian Matuszewski; Marcel E. Hildebrandt; Ana Hermina Ghenu; Jeffrey D. Jensen; Claudia Bank

doi:10.1534/genetics.116.190462

A statistical guide to the design of deep mutational scanning experiments

Sebastian Matuszewski, Marcel E. Hildebrandt, Ana Hermina Ghenu, Jeffrey D. Jensen, Claudia Bank

Research output: Contribution to journal › Article › peer-review

23 Scopus citations

Abstract

The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.

Original language	English (US)
Pages (from-to)	77-87
Number of pages	11
Journal	Genetics
Volume	204
Issue number	1
DOIs	https://doi.org/10.1534/genetics.116.190462
State	Published - Sep 2016
Externally published	Yes

Keywords

Distribution of fitness effects
Experimental design
Experimental evolution
Mutation
Population genetics

ASJC Scopus subject areas

Genetics

Access to Document

10.1534/genetics.116.190462

Cite this

@article{534e9602cb664ade8c70b17f4bf6bd57,

title = "A statistical guide to the design of deep mutational scanning experiments",

abstract = "The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.",

keywords = "Distribution of fitness effects, Experimental design, Experimental evolution, Mutation, Population genetics",

author = "Sebastian Matuszewski and Hildebrandt, {Marcel E.} and Ghenu, {Ana Hermina} and Jensen, {Jeffrey D.} and Claudia Bank",

note = "Publisher Copyright: {\textcopyright} 2016 by the Genetics Society of America.",

year = "2016",

month = sep,

doi = "10.1534/genetics.116.190462",

language = "English (US)",

volume = "204",

pages = "77--87",

journal = "Genetics",

issn = "0016-6731",

publisher = "Genetics Society of America",

number = "1",

}

TY - JOUR

T1 - A statistical guide to the design of deep mutational scanning experiments

AU - Matuszewski, Sebastian

AU - Hildebrandt, Marcel E.

AU - Ghenu, Ana Hermina

AU - Jensen, Jeffrey D.

AU - Bank, Claudia

PY - 2016/9

Y1 - 2016/9

N2 - The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.

AB - The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.

KW - Distribution of fitness effects

KW - Experimental design

KW - Experimental evolution

KW - Mutation

KW - Population genetics

UR - http://www.scopus.com/inward/record.url?scp=84986254038&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84986254038&partnerID=8YFLogxK

U2 - 10.1534/genetics.116.190462

DO - 10.1534/genetics.116.190462

M3 - Article

C2 - 27412710

AN - SCOPUS:84986254038

SN - 0016-6731

VL - 204

SP - 77

EP - 87

JO - Genetics

JF - Genetics

IS - 1

ER -

A statistical guide to the design of deep mutational scanning experiments

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this