TY - JOUR
T1 - Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study
AU - Lujan-Moreno, Gustavo A.
AU - Howard, Phillip R.
AU - Rojas, Omar G.
AU - Montgomery, Douglas C.
N1 - Publisher Copyright:
© 2018
PY - 2018/11/1
Y1 - 2018/11/1
N2 - Most machine learning algorithms possess hyperparameters. For example, an artificial neural network requires the determination of the number of hidden layers, nodes, and many other parameters related to the model fitting process. Despite this, there is still no clear consensus on how to tune them. The most popular methodology is an exhaustive grid search, which can be highly inefficient and sometimes infeasible. Another common solution is to change one hyperparameter at a time and measure its effect on the model's performance. However, this can also be inefficient and does not guarantee optimal results since it ignores interactions between the hyperparameters. In this paper, we propose to use the Design of Experiments (DOE) methodology (factorial designs) for screening and Response Surface Methodology (RSM) to tune a machine learning algorithm's hyperparameters. An application of our methodology is presented with a detailed discussion of the results of a random forest case-study using a publicly available dataset. Benefits include fewer training runs, better parameter selection, and a disciplined approach based on statistical theory.
AB - Most machine learning algorithms possess hyperparameters. For example, an artificial neural network requires the determination of the number of hidden layers, nodes, and many other parameters related to the model fitting process. Despite this, there is still no clear consensus on how to tune them. The most popular methodology is an exhaustive grid search, which can be highly inefficient and sometimes infeasible. Another common solution is to change one hyperparameter at a time and measure its effect on the model's performance. However, this can also be inefficient and does not guarantee optimal results since it ignores interactions between the hyperparameters. In this paper, we propose to use the Design of Experiments (DOE) methodology (factorial designs) for screening and Response Surface Methodology (RSM) to tune a machine learning algorithm's hyperparameters. An application of our methodology is presented with a detailed discussion of the results of a random forest case-study using a publicly available dataset. Benefits include fewer training runs, better parameter selection, and a disciplined approach based on statistical theory.
KW - Design of experiments
KW - Hyperparameters
KW - Machine learning
KW - Random forest
KW - Response surface methodology
KW - Tuning
UR - http://www.scopus.com/inward/record.url?scp=85047768020&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047768020&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2018.05.024
DO - 10.1016/j.eswa.2018.05.024
M3 - Article
AN - SCOPUS:85047768020
SN - 0957-4174
VL - 109
SP - 195
EP - 205
JO - Expert Systems With Applications
JF - Expert Systems With Applications
ER -