Reliable Trees: Reliability Informed Recursive Partitioning for Psychological Data

Kevin J. Grimm, Ross Jacobucci

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Recursive partitioning, also known as decision trees and classification and regression trees (CART), is a machine learning procedure that has gained traction in the behavioral sciences because of its ability to search for nonlinear and interactive effects, and produce interpretable predictive models. The recursive partitioning algorithm is greedy—searching for the variable and the splitting value that maximizes outcome homogeneity. Thus, the algorithm can be overly sensitive to chance associations in the data, particularly in small samples. In an effort to limit chance associations, we propose and evaluate a reliability-based cost function for recursive partitioning. The reliability-based cost function increases the likelihood of selecting variables that are more reliable, which should have more consistent associations with the outcome of interest. Two reliability-based cost functions are proposed, evaluated through simulation, and compared to the CART algorithm. Results indicate that reliability-based cost functions can be beneficial, particularly with smaller samples and when more reliable variables are important to the prediction, but can overlook important associations between the outcome and lower reliability predictors. The use of these cost functions was illustrated using data on depression and suicidal ideation from the National Longitudinal Survey of Youth.

Original languageEnglish (US)
Pages (from-to)595-607
Number of pages13
JournalMultivariate Behavioral Research
Volume56
Issue number4
DOIs
StatePublished - 2021

Keywords

  • CART
  • Machine learning
  • reliability

ASJC Scopus subject areas

  • Statistics and Probability
  • Experimental and Cognitive Psychology
  • Arts and Humanities (miscellaneous)

Fingerprint

Dive into the research topics of 'Reliable Trees: Reliability Informed Recursive Partitioning for Psychological Data'. Together they form a unique fingerprint.

Cite this