Reliable Trees: Reliability Informed Recursive Partitioning for Psychological Data

Kevin J. Grimm; Ross Jacobucci

doi:10.1080/00273171.2020.1751028

Reliable Trees: Reliability Informed Recursive Partitioning for Psychological Data

Kevin J. Grimm, Ross Jacobucci

Psychology

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

Recursive partitioning, also known as decision trees and classification and regression trees (CART), is a machine learning procedure that has gained traction in the behavioral sciences because of its ability to search for nonlinear and interactive effects, and produce interpretable predictive models. The recursive partitioning algorithm is greedy—searching for the variable and the splitting value that maximizes outcome homogeneity. Thus, the algorithm can be overly sensitive to chance associations in the data, particularly in small samples. In an effort to limit chance associations, we propose and evaluate a reliability-based cost function for recursive partitioning. The reliability-based cost function increases the likelihood of selecting variables that are more reliable, which should have more consistent associations with the outcome of interest. Two reliability-based cost functions are proposed, evaluated through simulation, and compared to the CART algorithm. Results indicate that reliability-based cost functions can be beneficial, particularly with smaller samples and when more reliable variables are important to the prediction, but can overlook important associations between the outcome and lower reliability predictors. The use of these cost functions was illustrated using data on depression and suicidal ideation from the National Longitudinal Survey of Youth.

Original language	English (US)
Pages (from-to)	595-607
Number of pages	13
Journal	Multivariate Behavioral Research
Volume	56
Issue number	4
DOIs	https://doi.org/10.1080/00273171.2020.1751028
State	Published - 2021

Keywords

CART
Machine learning
reliability

ASJC Scopus subject areas

Statistics and Probability
Experimental and Cognitive Psychology
Arts and Humanities (miscellaneous)

Access to Document

10.1080/00273171.2020.1751028

Cite this

@article{752154d031ad49d0bf7f7eff215ea238,

title = "Reliable Trees: Reliability Informed Recursive Partitioning for Psychological Data",

abstract = "Recursive partitioning, also known as decision trees and classification and regression trees (CART), is a machine learning procedure that has gained traction in the behavioral sciences because of its ability to search for nonlinear and interactive effects, and produce interpretable predictive models. The recursive partitioning algorithm is greedy—searching for the variable and the splitting value that maximizes outcome homogeneity. Thus, the algorithm can be overly sensitive to chance associations in the data, particularly in small samples. In an effort to limit chance associations, we propose and evaluate a reliability-based cost function for recursive partitioning. The reliability-based cost function increases the likelihood of selecting variables that are more reliable, which should have more consistent associations with the outcome of interest. Two reliability-based cost functions are proposed, evaluated through simulation, and compared to the CART algorithm. Results indicate that reliability-based cost functions can be beneficial, particularly with smaller samples and when more reliable variables are important to the prediction, but can overlook important associations between the outcome and lower reliability predictors. The use of these cost functions was illustrated using data on depression and suicidal ideation from the National Longitudinal Survey of Youth.",

keywords = "CART, Machine learning, reliability",

author = "Grimm, {Kevin J.} and Ross Jacobucci",

note = "Funding Information: Illustrative data come from the National Longitudinal Survey of Youth-Children and Young Adults (NLSY-CYA; Center for Human Resource Research, ), which was funded by the U.S. Bureau of Labor Statistics. The NLSY-CYA began in 1986 with children of female respondents of the National Longitudinal Survey of Youth 1979. These participants were assessed every two years. In 2010, participants were administered a short-form of the Center for Epidemiologic Studies-Depression (CES-D) Scale (11-items) and asked about suicide ideation in 2012. In this illustration, we use the CES-D total score and item responses to predict recent suicide ideation (seriously considered suicide during the past 12 months) for participants who indicated that they seriously considered attempting suicide. Participants with missing values were removed from the analysis leaving 299 participants who completed the CES-D in 2010 and responded to the suicide ideation question in 2012. Publisher Copyright: {\textcopyright} 2020 Taylor & Francis Group, LLC.",

year = "2021",

doi = "10.1080/00273171.2020.1751028",

language = "English (US)",

volume = "56",

pages = "595--607",

journal = "Multivariate Behavioral Research",

issn = "0027-3171",

publisher = "Psychology Press Ltd",

number = "4",

}

TY - JOUR

T1 - Reliable Trees

T2 - Reliability Informed Recursive Partitioning for Psychological Data

AU - Grimm, Kevin J.

AU - Jacobucci, Ross

N1 - Funding Information: Illustrative data come from the National Longitudinal Survey of Youth-Children and Young Adults (NLSY-CYA; Center for Human Resource Research, ), which was funded by the U.S. Bureau of Labor Statistics. The NLSY-CYA began in 1986 with children of female respondents of the National Longitudinal Survey of Youth 1979. These participants were assessed every two years. In 2010, participants were administered a short-form of the Center for Epidemiologic Studies-Depression (CES-D) Scale (11-items) and asked about suicide ideation in 2012. In this illustration, we use the CES-D total score and item responses to predict recent suicide ideation (seriously considered suicide during the past 12 months) for participants who indicated that they seriously considered attempting suicide. Participants with missing values were removed from the analysis leaving 299 participants who completed the CES-D in 2010 and responded to the suicide ideation question in 2012. Publisher Copyright: © 2020 Taylor & Francis Group, LLC.

PY - 2021

Y1 - 2021

N2 - Recursive partitioning, also known as decision trees and classification and regression trees (CART), is a machine learning procedure that has gained traction in the behavioral sciences because of its ability to search for nonlinear and interactive effects, and produce interpretable predictive models. The recursive partitioning algorithm is greedy—searching for the variable and the splitting value that maximizes outcome homogeneity. Thus, the algorithm can be overly sensitive to chance associations in the data, particularly in small samples. In an effort to limit chance associations, we propose and evaluate a reliability-based cost function for recursive partitioning. The reliability-based cost function increases the likelihood of selecting variables that are more reliable, which should have more consistent associations with the outcome of interest. Two reliability-based cost functions are proposed, evaluated through simulation, and compared to the CART algorithm. Results indicate that reliability-based cost functions can be beneficial, particularly with smaller samples and when more reliable variables are important to the prediction, but can overlook important associations between the outcome and lower reliability predictors. The use of these cost functions was illustrated using data on depression and suicidal ideation from the National Longitudinal Survey of Youth.

AB - Recursive partitioning, also known as decision trees and classification and regression trees (CART), is a machine learning procedure that has gained traction in the behavioral sciences because of its ability to search for nonlinear and interactive effects, and produce interpretable predictive models. The recursive partitioning algorithm is greedy—searching for the variable and the splitting value that maximizes outcome homogeneity. Thus, the algorithm can be overly sensitive to chance associations in the data, particularly in small samples. In an effort to limit chance associations, we propose and evaluate a reliability-based cost function for recursive partitioning. The reliability-based cost function increases the likelihood of selecting variables that are more reliable, which should have more consistent associations with the outcome of interest. Two reliability-based cost functions are proposed, evaluated through simulation, and compared to the CART algorithm. Results indicate that reliability-based cost functions can be beneficial, particularly with smaller samples and when more reliable variables are important to the prediction, but can overlook important associations between the outcome and lower reliability predictors. The use of these cost functions was illustrated using data on depression and suicidal ideation from the National Longitudinal Survey of Youth.

KW - CART

KW - Machine learning

KW - reliability

UR - http://www.scopus.com/inward/record.url?scp=85084121164&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85084121164&partnerID=8YFLogxK

U2 - 10.1080/00273171.2020.1751028

DO - 10.1080/00273171.2020.1751028

M3 - Article

C2 - 32298157

AN - SCOPUS:85084121164

SN - 0027-3171

VL - 56

SP - 595

EP - 607

JO - Multivariate Behavioral Research

JF - Multivariate Behavioral Research

IS - 4

ER -

Reliable Trees: Reliability Informed Recursive Partitioning for Psychological Data

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this