Modeling Clustered Data with Very Few Clusters

Daniel McNeish, Laura M. Stapleton

Research output: Contribution to journalArticle

68 Citations (Scopus)

Abstract

Small-sample inference with clustered data has received increased attention recently in the methodological literature, with several simulation studies being presented on the small-sample behavior of many methods. However, nearly all previous studies focus on a single class of methods (e.g., only multilevel models, only corrections to sandwich estimators), and the differential performance of various methods that can be implemented to accommodate clustered data with very few clusters is largely unknown, potentially due to the rigid disciplinary preferences. Furthermore, a majority of these studies focus on scenarios with 15 or more clusters and feature unrealistically simple data-generation models with very few predictors. This article, motivated by an applied educational psychology cluster randomized trial, presents a simulation study that simultaneously addresses the extreme small sample and differential performance (estimation bias, Type I error rates, and relative power) of 12 methods to account for clustered data with a model that features a more realistic number of predictors. The motivating data are then modeled with each method, and results are compared. Results show that generalized estimating equations perform poorly; the choice of Bayesian prior distributions affects performance; and fixed effect models perform quite well. Limitations and implications for applications are also discussed.

Original languageEnglish (US)
Pages (from-to)495-518
Number of pages24
JournalMultivariate Behavioral Research
Volume51
Issue number4
DOIs
StatePublished - Jul 3 2016
Externally publishedYes

Fingerprint

Clustered Data
Small Sample
Modeling
Predictors
Sandwich Estimator
Educational Psychology
Simulation Study
Applied Psychology
Fixed Effects Model
Multilevel Models
Randomized Trial
Generalized Estimating Equations
Type I Error Rate
Feature Model
Prior distribution
Extremes
Unknown
Scenarios

Keywords

  • Bayesian
  • cluster randomized trial
  • fixed effect model
  • GEE
  • HLM
  • multilevel model
  • small sample

ASJC Scopus subject areas

  • Statistics and Probability
  • Experimental and Cognitive Psychology
  • Arts and Humanities (miscellaneous)

Cite this

Modeling Clustered Data with Very Few Clusters. / McNeish, Daniel; Stapleton, Laura M.

In: Multivariate Behavioral Research, Vol. 51, No. 4, 03.07.2016, p. 495-518.

Research output: Contribution to journalArticle

McNeish, Daniel ; Stapleton, Laura M. / Modeling Clustered Data with Very Few Clusters. In: Multivariate Behavioral Research. 2016 ; Vol. 51, No. 4. pp. 495-518.
@article{21caca590d4a410f84a52f0a9926ec52,
title = "Modeling Clustered Data with Very Few Clusters",
abstract = "Small-sample inference with clustered data has received increased attention recently in the methodological literature, with several simulation studies being presented on the small-sample behavior of many methods. However, nearly all previous studies focus on a single class of methods (e.g., only multilevel models, only corrections to sandwich estimators), and the differential performance of various methods that can be implemented to accommodate clustered data with very few clusters is largely unknown, potentially due to the rigid disciplinary preferences. Furthermore, a majority of these studies focus on scenarios with 15 or more clusters and feature unrealistically simple data-generation models with very few predictors. This article, motivated by an applied educational psychology cluster randomized trial, presents a simulation study that simultaneously addresses the extreme small sample and differential performance (estimation bias, Type I error rates, and relative power) of 12 methods to account for clustered data with a model that features a more realistic number of predictors. The motivating data are then modeled with each method, and results are compared. Results show that generalized estimating equations perform poorly; the choice of Bayesian prior distributions affects performance; and fixed effect models perform quite well. Limitations and implications for applications are also discussed.",
keywords = "Bayesian, cluster randomized trial, fixed effect model, GEE, HLM, multilevel model, small sample",
author = "Daniel McNeish and Stapleton, {Laura M.}",
year = "2016",
month = "7",
day = "3",
doi = "10.1080/00273171.2016.1167008",
language = "English (US)",
volume = "51",
pages = "495--518",
journal = "Multivariate Behavioral Research",
issn = "0027-3171",
publisher = "Psychology Press Ltd",
number = "4",

}

TY - JOUR

T1 - Modeling Clustered Data with Very Few Clusters

AU - McNeish, Daniel

AU - Stapleton, Laura M.

PY - 2016/7/3

Y1 - 2016/7/3

N2 - Small-sample inference with clustered data has received increased attention recently in the methodological literature, with several simulation studies being presented on the small-sample behavior of many methods. However, nearly all previous studies focus on a single class of methods (e.g., only multilevel models, only corrections to sandwich estimators), and the differential performance of various methods that can be implemented to accommodate clustered data with very few clusters is largely unknown, potentially due to the rigid disciplinary preferences. Furthermore, a majority of these studies focus on scenarios with 15 or more clusters and feature unrealistically simple data-generation models with very few predictors. This article, motivated by an applied educational psychology cluster randomized trial, presents a simulation study that simultaneously addresses the extreme small sample and differential performance (estimation bias, Type I error rates, and relative power) of 12 methods to account for clustered data with a model that features a more realistic number of predictors. The motivating data are then modeled with each method, and results are compared. Results show that generalized estimating equations perform poorly; the choice of Bayesian prior distributions affects performance; and fixed effect models perform quite well. Limitations and implications for applications are also discussed.

AB - Small-sample inference with clustered data has received increased attention recently in the methodological literature, with several simulation studies being presented on the small-sample behavior of many methods. However, nearly all previous studies focus on a single class of methods (e.g., only multilevel models, only corrections to sandwich estimators), and the differential performance of various methods that can be implemented to accommodate clustered data with very few clusters is largely unknown, potentially due to the rigid disciplinary preferences. Furthermore, a majority of these studies focus on scenarios with 15 or more clusters and feature unrealistically simple data-generation models with very few predictors. This article, motivated by an applied educational psychology cluster randomized trial, presents a simulation study that simultaneously addresses the extreme small sample and differential performance (estimation bias, Type I error rates, and relative power) of 12 methods to account for clustered data with a model that features a more realistic number of predictors. The motivating data are then modeled with each method, and results are compared. Results show that generalized estimating equations perform poorly; the choice of Bayesian prior distributions affects performance; and fixed effect models perform quite well. Limitations and implications for applications are also discussed.

KW - Bayesian

KW - cluster randomized trial

KW - fixed effect model

KW - GEE

KW - HLM

KW - multilevel model

KW - small sample

UR - http://www.scopus.com/inward/record.url?scp=84973596359&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973596359&partnerID=8YFLogxK

U2 - 10.1080/00273171.2016.1167008

DO - 10.1080/00273171.2016.1167008

M3 - Article

C2 - 27269278

AN - SCOPUS:84973596359

VL - 51

SP - 495

EP - 518

JO - Multivariate Behavioral Research

JF - Multivariate Behavioral Research

SN - 0027-3171

IS - 4

ER -