Analysis of Interactions and Nonlinear Effects with Missing Data: A Factored Regression Modeling Approach Using Maximum Likelihood Estimation

Oliver Lüdtke, Alexander Robitzsch, Stephen West

Research output: Contribution to journalArticle

Abstract

When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a multivariate normal distribution, which is also the default in many statistical software packages. This distribution will in general be misspecified if predictors with missing data have nonlinear effects (e.g., x2) or are included in interaction terms (e.g., x·z). In the present article, we introduce a factored regression modeling approach for estimating regression models with missing data that is based on maximum likelihood estimation. In this approach, the model likelihood is factorized into a part that is due to the model of interest and a part that is due to the model for the incomplete predictors. In three simulation studies, we showed that the factored regression modeling approach produced valid estimates of interaction and nonlinear effects in regression models with missing values on categorical or continuous predictor variables under a broad range of conditions. We developed the R package mdmb, which facilitates a user-friendly application of the factored regression modeling approach, and present a real-data example that illustrates the flexibility of the software.

Original languageEnglish (US)
JournalMultivariate Behavioral Research
DOIs
StateAccepted/In press - Jan 1 2019

Fingerprint

Interaction Effects
Nonlinear Effects
Missing Data
Maximum Likelihood Estimation
Predictors
Software
Regression
Normal Distribution
Regression Model
Modeling
Statistical Software
Multivariate Normal Distribution
Missing Values
Multiple Regression
Multiple Models
Joint Distribution
Software Package
Categorical
Likelihood
Flexibility

Keywords

  • interaction effects
  • maximum likelihood estimation
  • missing data
  • Multiple regression

ASJC Scopus subject areas

  • Statistics and Probability
  • Experimental and Cognitive Psychology
  • Arts and Humanities (miscellaneous)

Cite this

@article{2fd5432b316b414a9ca62781dfe50260,
title = "Analysis of Interactions and Nonlinear Effects with Missing Data: A Factored Regression Modeling Approach Using Maximum Likelihood Estimation",
abstract = "When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a multivariate normal distribution, which is also the default in many statistical software packages. This distribution will in general be misspecified if predictors with missing data have nonlinear effects (e.g., x2) or are included in interaction terms (e.g., x·z). In the present article, we introduce a factored regression modeling approach for estimating regression models with missing data that is based on maximum likelihood estimation. In this approach, the model likelihood is factorized into a part that is due to the model of interest and a part that is due to the model for the incomplete predictors. In three simulation studies, we showed that the factored regression modeling approach produced valid estimates of interaction and nonlinear effects in regression models with missing values on categorical or continuous predictor variables under a broad range of conditions. We developed the R package mdmb, which facilitates a user-friendly application of the factored regression modeling approach, and present a real-data example that illustrates the flexibility of the software.",
keywords = "interaction effects, maximum likelihood estimation, missing data, Multiple regression",
author = "Oliver L{\"u}dtke and Alexander Robitzsch and Stephen West",
year = "2019",
month = "1",
day = "1",
doi = "10.1080/00273171.2019.1640104",
language = "English (US)",
journal = "Multivariate Behavioral Research",
issn = "0027-3171",
publisher = "Psychology Press Ltd",

}

TY - JOUR

T1 - Analysis of Interactions and Nonlinear Effects with Missing Data

T2 - A Factored Regression Modeling Approach Using Maximum Likelihood Estimation

AU - Lüdtke, Oliver

AU - Robitzsch, Alexander

AU - West, Stephen

PY - 2019/1/1

Y1 - 2019/1/1

N2 - When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a multivariate normal distribution, which is also the default in many statistical software packages. This distribution will in general be misspecified if predictors with missing data have nonlinear effects (e.g., x2) or are included in interaction terms (e.g., x·z). In the present article, we introduce a factored regression modeling approach for estimating regression models with missing data that is based on maximum likelihood estimation. In this approach, the model likelihood is factorized into a part that is due to the model of interest and a part that is due to the model for the incomplete predictors. In three simulation studies, we showed that the factored regression modeling approach produced valid estimates of interaction and nonlinear effects in regression models with missing values on categorical or continuous predictor variables under a broad range of conditions. We developed the R package mdmb, which facilitates a user-friendly application of the factored regression modeling approach, and present a real-data example that illustrates the flexibility of the software.

AB - When estimating multiple regression models with incomplete predictor variables, it is necessary to specify a joint distribution for the predictor variables. A convenient assumption is that this distribution is a multivariate normal distribution, which is also the default in many statistical software packages. This distribution will in general be misspecified if predictors with missing data have nonlinear effects (e.g., x2) or are included in interaction terms (e.g., x·z). In the present article, we introduce a factored regression modeling approach for estimating regression models with missing data that is based on maximum likelihood estimation. In this approach, the model likelihood is factorized into a part that is due to the model of interest and a part that is due to the model for the incomplete predictors. In three simulation studies, we showed that the factored regression modeling approach produced valid estimates of interaction and nonlinear effects in regression models with missing values on categorical or continuous predictor variables under a broad range of conditions. We developed the R package mdmb, which facilitates a user-friendly application of the factored regression modeling approach, and present a real-data example that illustrates the flexibility of the software.

KW - interaction effects

KW - maximum likelihood estimation

KW - missing data

KW - Multiple regression

UR - http://www.scopus.com/inward/record.url?scp=85070237530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070237530&partnerID=8YFLogxK

U2 - 10.1080/00273171.2019.1640104

DO - 10.1080/00273171.2019.1640104

M3 - Article

AN - SCOPUS:85070237530

JO - Multivariate Behavioral Research

JF - Multivariate Behavioral Research

SN - 0027-3171

ER -