Separation in D-optimal experimental designs for the logistic regression model

Anson R. Park; Michelle Mancenido; Douglas Montgomery

doi:10.1002/qre.2411

Separation in D-optimal experimental designs for the logistic regression model

Anson R. Park, Michelle Mancenido, Douglas Montgomery

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.

Original language	English (US)
Pages (from-to)	776-787
Number of pages	12
Journal	Quality and Reliability Engineering International
Volume	35
Issue number	3
DOIs	https://doi.org/10.1002/qre.2411
State	Published - Apr 2019

Keywords

D-optimal
experimental design
logistic regression model
nonlinear
separation

ASJC Scopus subject areas

Safety, Risk, Reliability and Quality
Management Science and Operations Research

Access to Document

10.1002/qre.2411

Cite this

@article{0e4f14d5eb04404c98d797b3b449d3ba,

title = "Separation in D-optimal experimental designs for the logistic regression model",

abstract = "The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.",

keywords = "D-optimal, experimental design, logistic regression model, nonlinear, separation",

author = "Park, {Anson R.} and Michelle Mancenido and Douglas Montgomery",

note = "Publisher Copyright: {\textcopyright} 2018 John Wiley & Sons, Ltd.",

year = "2019",

month = apr,

doi = "10.1002/qre.2411",

language = "English (US)",

volume = "35",

pages = "776--787",

journal = "Quality and Reliability Engineering International",

issn = "0748-8017",

publisher = "John Wiley and Sons Ltd",

number = "3",

}

TY - JOUR

T1 - Separation in D-optimal experimental designs for the logistic regression model

AU - Park, Anson R.

AU - Mancenido, Michelle

AU - Montgomery, Douglas

PY - 2019/4

Y1 - 2019/4

N2 - The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.

AB - The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.

KW - D-optimal

KW - experimental design

KW - logistic regression model

KW - nonlinear

KW - separation

UR - http://www.scopus.com/inward/record.url?scp=85055046990&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85055046990&partnerID=8YFLogxK

U2 - 10.1002/qre.2411

DO - 10.1002/qre.2411

M3 - Article

AN - SCOPUS:85055046990

SN - 0748-8017

VL - 35

SP - 776

EP - 787

JO - Quality and Reliability Engineering International

JF - Quality and Reliability Engineering International

IS - 3

ER -

Separation in D-optimal experimental designs for the logistic regression model

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this