Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences

Daniel M. McNeish

doi:10.1080/00273171.2015.1036965

Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences

Daniel M. McNeish

Research output: Contribution to journal › Article › peer-review

227 Scopus citations

Abstract

Ordinary least squares and stepwise selection are widespread in behavioral science research; however, these methods are well known to encounter overfitting problems such that R² and regression coefficients may be inflated while standard errors and p values may be deflated, ultimately reducing both the parsimony of the model and the generalizability of conclusions. More optimal methods for selecting predictors and estimating regression coefficients such as regularization methods (e.g., Lasso) have existed for decades, are widely implemented in other disciplines, and are available in mainstream software, yet, these methods are essentially invisible in the behavioral science literature while the use of sub optimal methods continues to proliferate. This paper discusses potential issues with standard statistical models, provides an introduction to regularization with specific details on both Lasso and its related predecessor ridge regression, provides an example analysis and code for running a Lasso analysis in R and SAS, and discusses limitations and related methods.

Original language	English (US)
Pages (from-to)	471-484
Number of pages	14
Journal	Multivariate Behavioral Research
Volume	50
Issue number	5
DOIs	https://doi.org/10.1080/00273171.2015.1036965
State	Published - Sep 3 2015
Externally published	Yes

Keywords

lasso
overfitting
regression
regularization

ASJC Scopus subject areas

Statistics and Probability
Experimental and Cognitive Psychology
Arts and Humanities (miscellaneous)

Access to Document

10.1080/00273171.2015.1036965

Cite this

@article{0f84ebfdc58e4405b8d57a0b73de0a81,

title = "Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences",

abstract = "Ordinary least squares and stepwise selection are widespread in behavioral science research; however, these methods are well known to encounter overfitting problems such that R2 and regression coefficients may be inflated while standard errors and p values may be deflated, ultimately reducing both the parsimony of the model and the generalizability of conclusions. More optimal methods for selecting predictors and estimating regression coefficients such as regularization methods (e.g., Lasso) have existed for decades, are widely implemented in other disciplines, and are available in mainstream software, yet, these methods are essentially invisible in the behavioral science literature while the use of sub optimal methods continues to proliferate. This paper discusses potential issues with standard statistical models, provides an introduction to regularization with specific details on both Lasso and its related predecessor ridge regression, provides an example analysis and code for running a Lasso analysis in R and SAS, and discusses limitations and related methods.",

keywords = "lasso, overfitting, regression, regularization",

author = "McNeish, {Daniel M.}",

note = "Publisher Copyright: {\textcopyright} 2015, Copyright {\textcopyright} Taylor & Francis Group, LLC.",

year = "2015",

month = sep,

day = "3",

doi = "10.1080/00273171.2015.1036965",

language = "English (US)",

volume = "50",

pages = "471--484",

journal = "Multivariate Behavioral Research",

issn = "0027-3171",

publisher = "Psychology Press Ltd",

number = "5",

}

TY - JOUR

T1 - Using Lasso for Predictor Selection and to Assuage Overfitting

T2 - A Method Long Overlooked in Behavioral Sciences

AU - McNeish, Daniel M.

PY - 2015/9/3

Y1 - 2015/9/3

N2 - Ordinary least squares and stepwise selection are widespread in behavioral science research; however, these methods are well known to encounter overfitting problems such that R2 and regression coefficients may be inflated while standard errors and p values may be deflated, ultimately reducing both the parsimony of the model and the generalizability of conclusions. More optimal methods for selecting predictors and estimating regression coefficients such as regularization methods (e.g., Lasso) have existed for decades, are widely implemented in other disciplines, and are available in mainstream software, yet, these methods are essentially invisible in the behavioral science literature while the use of sub optimal methods continues to proliferate. This paper discusses potential issues with standard statistical models, provides an introduction to regularization with specific details on both Lasso and its related predecessor ridge regression, provides an example analysis and code for running a Lasso analysis in R and SAS, and discusses limitations and related methods.

AB - Ordinary least squares and stepwise selection are widespread in behavioral science research; however, these methods are well known to encounter overfitting problems such that R2 and regression coefficients may be inflated while standard errors and p values may be deflated, ultimately reducing both the parsimony of the model and the generalizability of conclusions. More optimal methods for selecting predictors and estimating regression coefficients such as regularization methods (e.g., Lasso) have existed for decades, are widely implemented in other disciplines, and are available in mainstream software, yet, these methods are essentially invisible in the behavioral science literature while the use of sub optimal methods continues to proliferate. This paper discusses potential issues with standard statistical models, provides an introduction to regularization with specific details on both Lasso and its related predecessor ridge regression, provides an example analysis and code for running a Lasso analysis in R and SAS, and discusses limitations and related methods.

KW - lasso

KW - overfitting

KW - regression

KW - regularization

UR - http://www.scopus.com/inward/record.url?scp=84944058260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944058260&partnerID=8YFLogxK

U2 - 10.1080/00273171.2015.1036965

DO - 10.1080/00273171.2015.1036965

M3 - Article

C2 - 26610247

AN - SCOPUS:84944058260

SN - 0027-3171

VL - 50

SP - 471

EP - 484

JO - Multivariate Behavioral Research

JF - Multivariate Behavioral Research

IS - 5

ER -

Using Lasso for Predictor Selection and to Assuage Overfitting: A Method Long Overlooked in Behavioral Sciences

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this