Assessing the effect of selection at the amino acid level in malaria antigen sequences through Bayesian generalized linear models

Daniel Merl, Raquel Prado, Ananías A. Escalante

Research output: Contribution to journalArticle

Abstract

We present a statistical approach for identifying residues in DNA sequences for which diversity may be maintained by natural selection. Bayesian generalized linear models (GLMs) are used to describe patterns of mutation in a DNA sequence alignment. Posterior distributions of key quantities, such as probabilities of nonsynonymous and synonymous mutations per site, are studied. Inference in this class of models is achieved through customary Markov chain Monte Carlo methods. Model selection is dealt with by means of a minimum posterior predictive loss approach. We describe how information on the evolutionary process underlying the sequences can be formally incorporated into the models through structured priors. The proposed methodology was designed to analyze several DNA sequences encoding the vaccine candidate apical membrane antigen-1 (AMA-1) of the human malaria parasite Plasmodium falciparum. The study of genetic variability in antigen sequences is relevant to determining whether a particular antigen is a viable target for a vaccine construct. Using a simulation study, we first compare the GLM-based approach to existing methods for detecting sites under selection that are based on stochastic models of sequence evolution. We then apply the proposed models to the AMA-1 sequence data, which allows us to identify residues with the greatest disparities between nonsynonymous and synonymous changes. Recent experimental evidence suggests that several of these residues are immunologically relevant, indicating that the proposed models may be used predictively to identify functionally significant residues in antigens for which experimental results are not yet available.

Original languageEnglish (US)
Pages (from-to)1496-1507
Number of pages12
JournalJournal of the American Statistical Association
Volume103
Issue number484
DOIs
StatePublished - Dec 2008

Fingerprint

Malaria
Generalized Linear Model
Amino Acids
DNA Sequence
Vaccine
Mutation
Membrane
Natural Selection
Sequence Alignment
Markov Chain Monte Carlo Methods
Posterior distribution
Model Selection
Model
Stochastic Model
Encoding
Simulation Study
Model-based
Target
Generalized linear model
Methodology

Keywords

  • Bayesian generalized linear model
  • DNA sequence data
  • Malaria antigens
  • Model comparison
  • Mutation count data
  • Natural selection
  • Structured priors.

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Assessing the effect of selection at the amino acid level in malaria antigen sequences through Bayesian generalized linear models. / Merl, Daniel; Prado, Raquel; Escalante, Ananías A.

In: Journal of the American Statistical Association, Vol. 103, No. 484, 12.2008, p. 1496-1507.

Research output: Contribution to journalArticle

@article{8073a2f7b0794506b1b326bb9a74ed50,
title = "Assessing the effect of selection at the amino acid level in malaria antigen sequences through Bayesian generalized linear models",
abstract = "We present a statistical approach for identifying residues in DNA sequences for which diversity may be maintained by natural selection. Bayesian generalized linear models (GLMs) are used to describe patterns of mutation in a DNA sequence alignment. Posterior distributions of key quantities, such as probabilities of nonsynonymous and synonymous mutations per site, are studied. Inference in this class of models is achieved through customary Markov chain Monte Carlo methods. Model selection is dealt with by means of a minimum posterior predictive loss approach. We describe how information on the evolutionary process underlying the sequences can be formally incorporated into the models through structured priors. The proposed methodology was designed to analyze several DNA sequences encoding the vaccine candidate apical membrane antigen-1 (AMA-1) of the human malaria parasite Plasmodium falciparum. The study of genetic variability in antigen sequences is relevant to determining whether a particular antigen is a viable target for a vaccine construct. Using a simulation study, we first compare the GLM-based approach to existing methods for detecting sites under selection that are based on stochastic models of sequence evolution. We then apply the proposed models to the AMA-1 sequence data, which allows us to identify residues with the greatest disparities between nonsynonymous and synonymous changes. Recent experimental evidence suggests that several of these residues are immunologically relevant, indicating that the proposed models may be used predictively to identify functionally significant residues in antigens for which experimental results are not yet available.",
keywords = "Bayesian generalized linear model, DNA sequence data, Malaria antigens, Model comparison, Mutation count data, Natural selection, Structured priors.",
author = "Daniel Merl and Raquel Prado and Escalante, {Anan{\'i}as A.}",
year = "2008",
month = "12",
doi = "10.1198/016214508000000850",
language = "English (US)",
volume = "103",
pages = "1496--1507",
journal = "Journal of the American Statistical Association",
issn = "0162-1459",
publisher = "Taylor and Francis Ltd.",
number = "484",

}

TY - JOUR

T1 - Assessing the effect of selection at the amino acid level in malaria antigen sequences through Bayesian generalized linear models

AU - Merl, Daniel

AU - Prado, Raquel

AU - Escalante, Ananías A.

PY - 2008/12

Y1 - 2008/12

N2 - We present a statistical approach for identifying residues in DNA sequences for which diversity may be maintained by natural selection. Bayesian generalized linear models (GLMs) are used to describe patterns of mutation in a DNA sequence alignment. Posterior distributions of key quantities, such as probabilities of nonsynonymous and synonymous mutations per site, are studied. Inference in this class of models is achieved through customary Markov chain Monte Carlo methods. Model selection is dealt with by means of a minimum posterior predictive loss approach. We describe how information on the evolutionary process underlying the sequences can be formally incorporated into the models through structured priors. The proposed methodology was designed to analyze several DNA sequences encoding the vaccine candidate apical membrane antigen-1 (AMA-1) of the human malaria parasite Plasmodium falciparum. The study of genetic variability in antigen sequences is relevant to determining whether a particular antigen is a viable target for a vaccine construct. Using a simulation study, we first compare the GLM-based approach to existing methods for detecting sites under selection that are based on stochastic models of sequence evolution. We then apply the proposed models to the AMA-1 sequence data, which allows us to identify residues with the greatest disparities between nonsynonymous and synonymous changes. Recent experimental evidence suggests that several of these residues are immunologically relevant, indicating that the proposed models may be used predictively to identify functionally significant residues in antigens for which experimental results are not yet available.

AB - We present a statistical approach for identifying residues in DNA sequences for which diversity may be maintained by natural selection. Bayesian generalized linear models (GLMs) are used to describe patterns of mutation in a DNA sequence alignment. Posterior distributions of key quantities, such as probabilities of nonsynonymous and synonymous mutations per site, are studied. Inference in this class of models is achieved through customary Markov chain Monte Carlo methods. Model selection is dealt with by means of a minimum posterior predictive loss approach. We describe how information on the evolutionary process underlying the sequences can be formally incorporated into the models through structured priors. The proposed methodology was designed to analyze several DNA sequences encoding the vaccine candidate apical membrane antigen-1 (AMA-1) of the human malaria parasite Plasmodium falciparum. The study of genetic variability in antigen sequences is relevant to determining whether a particular antigen is a viable target for a vaccine construct. Using a simulation study, we first compare the GLM-based approach to existing methods for detecting sites under selection that are based on stochastic models of sequence evolution. We then apply the proposed models to the AMA-1 sequence data, which allows us to identify residues with the greatest disparities between nonsynonymous and synonymous changes. Recent experimental evidence suggests that several of these residues are immunologically relevant, indicating that the proposed models may be used predictively to identify functionally significant residues in antigens for which experimental results are not yet available.

KW - Bayesian generalized linear model

KW - DNA sequence data

KW - Malaria antigens

KW - Model comparison

KW - Mutation count data

KW - Natural selection

KW - Structured priors.

UR - http://www.scopus.com/inward/record.url?scp=78649615997&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78649615997&partnerID=8YFLogxK

U2 - 10.1198/016214508000000850

DO - 10.1198/016214508000000850

M3 - Article

AN - SCOPUS:78649615997

VL - 103

SP - 1496

EP - 1507

JO - Journal of the American Statistical Association

JF - Journal of the American Statistical Association

SN - 0162-1459

IS - 484

ER -