Parallel Bayesian Additive Regression Trees

Matthew T. Pratola, Hugh A. Chipman, James R. Gattiker, David M. Higdon, Robert McCulloch, William N. Rust

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

Bayesian additive regression trees (BART) is a Bayesian approach to flexible nonlinear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this article we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.

Original languageEnglish (US)
Pages (from-to)830-852
Number of pages23
JournalJournal of Computational and Graphical Statistics
Volume23
Issue number3
DOIs
StatePublished - Jul 3 2014
Externally publishedYes

Fingerprint

Regression Tree
Tree Algorithms
Message Passing Interface
Bagging
Stochastic Search
Markov Chain Monte Carlo Algorithms
Nonlinear Regression
Parallel Computation
Boosting
Markov Chain Monte Carlo
Statistical Inference
Bayesian Approach
Repository
Regression tree
Linearly
Specification
Uncertainty

Keywords

  • Big Data
  • Markov chain Monte Carlo
  • Nonlinear
  • Scalable
  • Statistical computing

ASJC Scopus subject areas

  • Discrete Mathematics and Combinatorics
  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Pratola, M. T., Chipman, H. A., Gattiker, J. R., Higdon, D. M., McCulloch, R., & Rust, W. N. (2014). Parallel Bayesian Additive Regression Trees. Journal of Computational and Graphical Statistics, 23(3), 830-852. https://doi.org/10.1080/10618600.2013.841584

Parallel Bayesian Additive Regression Trees. / Pratola, Matthew T.; Chipman, Hugh A.; Gattiker, James R.; Higdon, David M.; McCulloch, Robert; Rust, William N.

In: Journal of Computational and Graphical Statistics, Vol. 23, No. 3, 03.07.2014, p. 830-852.

Research output: Contribution to journalArticle

Pratola, MT, Chipman, HA, Gattiker, JR, Higdon, DM, McCulloch, R & Rust, WN 2014, 'Parallel Bayesian Additive Regression Trees', Journal of Computational and Graphical Statistics, vol. 23, no. 3, pp. 830-852. https://doi.org/10.1080/10618600.2013.841584
Pratola, Matthew T. ; Chipman, Hugh A. ; Gattiker, James R. ; Higdon, David M. ; McCulloch, Robert ; Rust, William N. / Parallel Bayesian Additive Regression Trees. In: Journal of Computational and Graphical Statistics. 2014 ; Vol. 23, No. 3. pp. 830-852.
@article{f7b39f8e311043979b564404cff1206b,
title = "Parallel Bayesian Additive Regression Trees",
abstract = "Bayesian additive regression trees (BART) is a Bayesian approach to flexible nonlinear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this article we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.",
keywords = "Big Data, Markov chain Monte Carlo, Nonlinear, Scalable, Statistical computing",
author = "Pratola, {Matthew T.} and Chipman, {Hugh A.} and Gattiker, {James R.} and Higdon, {David M.} and Robert McCulloch and Rust, {William N.}",
year = "2014",
month = "7",
day = "3",
doi = "10.1080/10618600.2013.841584",
language = "English (US)",
volume = "23",
pages = "830--852",
journal = "Journal of Computational and Graphical Statistics",
issn = "1061-8600",
publisher = "American Statistical Association",
number = "3",

}

TY - JOUR

T1 - Parallel Bayesian Additive Regression Trees

AU - Pratola, Matthew T.

AU - Chipman, Hugh A.

AU - Gattiker, James R.

AU - Higdon, David M.

AU - McCulloch, Robert

AU - Rust, William N.

PY - 2014/7/3

Y1 - 2014/7/3

N2 - Bayesian additive regression trees (BART) is a Bayesian approach to flexible nonlinear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this article we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.

AB - Bayesian additive regression trees (BART) is a Bayesian approach to flexible nonlinear regression which has been shown to be competitive with the best modern predictive methods such as those based on bagging and boosting. BART offers some advantages. For example, the stochastic search Markov chain Monte Carlo (MCMC) algorithm can provide a more complete search of the model space and variation across MCMC draws can capture the level of uncertainty in the usual Bayesian way. The BART prior is robust in that reasonable results are typically obtained with a default prior specification. However, the publicly available implementation of the BART algorithm in the R package BayesTree is not fast enough to be considered interactive with over a thousand observations, and is unlikely to even run with 50,000 to 100,000 observations. In this article we show how the BART algorithm may be modified and then computed using single program, multiple data (SPMD) parallel computation implemented using the Message Passing Interface (MPI) library. The approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.

KW - Big Data

KW - Markov chain Monte Carlo

KW - Nonlinear

KW - Scalable

KW - Statistical computing

UR - http://www.scopus.com/inward/record.url?scp=84896367852&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896367852&partnerID=8YFLogxK

U2 - 10.1080/10618600.2013.841584

DO - 10.1080/10618600.2013.841584

M3 - Article

VL - 23

SP - 830

EP - 852

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

SN - 1061-8600

IS - 3

ER -