Separation in D-optimal experimental designs for the logistic regression model

Research output: Contribution to journalArticle

Abstract

The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.

Original languageEnglish (US)
JournalQuality and Reliability Engineering International
DOIs
StateAccepted/In press - Jan 1 2018

Fingerprint

Design of experiments
Logistics
Maximum likelihood estimation
Experimental design
Logistic regression model
Optimal design

Keywords

  • D-optimal
  • experimental design
  • logistic regression model
  • nonlinear
  • separation

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Management Science and Operations Research

Cite this

@article{0e4f14d5eb04404c98d797b3b449d3ba,
title = "Separation in D-optimal experimental designs for the logistic regression model",
abstract = "The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.",
keywords = "D-optimal, experimental design, logistic regression model, nonlinear, separation",
author = "Park, {Anson R.} and Michelle Mancenido and Douglas Montgomery",
year = "2018",
month = "1",
day = "1",
doi = "10.1002/qre.2411",
language = "English (US)",
journal = "Quality and Reliability Engineering International",
issn = "0748-8017",
publisher = "John Wiley and Sons Ltd",

}

TY - JOUR

T1 - Separation in D-optimal experimental designs for the logistic regression model

AU - Park, Anson R.

AU - Mancenido, Michelle

AU - Montgomery, Douglas

PY - 2018/1/1

Y1 - 2018/1/1

N2 - The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.

AB - The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.

KW - D-optimal

KW - experimental design

KW - logistic regression model

KW - nonlinear

KW - separation

UR - http://www.scopus.com/inward/record.url?scp=85055046990&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055046990&partnerID=8YFLogxK

U2 - 10.1002/qre.2411

DO - 10.1002/qre.2411

M3 - Article

AN - SCOPUS:85055046990

JO - Quality and Reliability Engineering International

JF - Quality and Reliability Engineering International

SN - 0748-8017

ER -