Propensity score analysis with missing data

Heining Cham, Stephen West

Research output: Contribution to journalArticlepeer-review

34 Scopus citations


Propensity score analysis is a method that equates treatment and control groups on a comprehensive set of measured confounders in observational (nonrandomized) studies. A successful propensity score analysis reduces bias in the estimate of the average treatment effect in a nonrandomized study, making the estimate more comparable with that obtained from a randomized experiment. This article reviews and discusses an important practical issue in propensity analysis, in which the baseline covariates (potential confounders) and the outcome have missing values (incompletely observed). We review the statistical theory of propensity score analysis and estimation methods for propensity scores with incompletely observed covariates. Traditional logistic regression and modern machine learning methods (e.g., random forests, generalized boosted modeling) as estimation methods for incompletely observed covariates are reviewed. Balance diagnostics and equating methods for incompletely observed covariates are briefly described. Using an empirical example, the propensity score estimation methods for incompletely observed covariates are illustrated and compared.

Original languageEnglish (US)
Pages (from-to)427-445
Number of pages19
JournalPsychological Methods
Issue number3
StatePublished - Sep 1 2016


  • Machine learning
  • Missing data
  • Nonrandomization
  • Propensity score

ASJC Scopus subject areas

  • Psychology (miscellaneous)


Dive into the research topics of 'Propensity score analysis with missing data'. Together they form a unique fingerprint.

Cite this