TY - JOUR
T1 - A statistical guide to the design of deep mutational scanning experiments
AU - Matuszewski, Sebastian
AU - Hildebrandt, Marcel E.
AU - Ghenu, Ana Hermina
AU - Jensen, Jeffrey D.
AU - Bank, Claudia
N1 - Publisher Copyright:
© 2016 by the Genetics Society of America.
PY - 2016/9
Y1 - 2016/9
N2 - The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.
AB - The characterization of the distribution of mutational effects is a key goal in evolutionary biology. Recently developed deepsequencing approaches allow for accurate and simultaneous estimation of the fitness effects of hundreds of engineered mutations by monitoring their relative abundance across time points in a single bulk competition. Naturally, the achievable resolution of the estimated fitness effects depends on the specific experimental setup, the organism and type of mutations studied, and the sequencing technology utilized, among other factors. By means of analytical approximations and simulations, we provide guidelines for optimizing time-sampled deep-sequencing bulk competition experiments, focusing on the number of mutants, the sequencing depth, and the number of sampled time points. Our analytical results show that sampling more time points together with extending the duration of the experiment improves the achievable precision disproportionately compared with increasing the sequencing depth or reducing the number of competing mutants. Even if the duration of the experiment is fixed, sampling more time points and clustering these at the beginning and the end of the experiment increase experimental power and allow for efficient and precise assessment of the entire range of selection coefficients. Finally, we provide a formula for calculating the 95%-confidence interval for the measurement error estimate, which we implement as an interactive web tool. This allows for quantification of the maximum expected a priori precision of the experimental setup, as well as for a statistical threshold for determining deviations from neutrality for specific selection coefficient estimates.
KW - Distribution of fitness effects
KW - Experimental design
KW - Experimental evolution
KW - Mutation
KW - Population genetics
UR - http://www.scopus.com/inward/record.url?scp=84986254038&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84986254038&partnerID=8YFLogxK
U2 - 10.1534/genetics.116.190462
DO - 10.1534/genetics.116.190462
M3 - Article
C2 - 27412710
AN - SCOPUS:84986254038
SN - 0016-6731
VL - 204
SP - 77
EP - 87
JO - Genetics
JF - Genetics
IS - 1
ER -