The D-optimality criterion is often used in computer-generated experimental designs when the response of interest is binary, such as when the attribute of interest can be categorized as pass or fail. The majority of methods in the generation of D-optimal designs focus on logistic regression as the base model for relating a set of experimental factors with the binary response. Despite the advances in computational algorithms for calculating D-optimal designs for the logistic regression model, very few have acknowledged the problem of separation, a phenomenon where the responses are perfectly separable by a hyperplane in the design space. Separation causes one or more parameters of the logistic regression model to be inestimable via maximum likelihood estimation. The objective of this paper is to investigate the tendency of computer-generated, nonsequential D-optimal designs to yield separation in small-sample experimental data. Sets of local D-optimal and Bayesian D-optimal designs with different run (sample) sizes are generated for several “ground truth” logistic regression models. A Monte Carlo simulation methodology is then used to estimate the probability of separation for each design. Results of the simulation study confirm that separation occurs frequently in small-sample data and that separation is more likely to occur when the ground truth model has interaction and quadratic terms. Finally, the paper illustrates that different designs with identical run sizes created from the same model can have significantly different chances of encountering separation.
- experimental design
- logistic regression model
ASJC Scopus subject areas
- Safety, Risk, Reliability and Quality
- Management Science and Operations Research