Modeling sparsely clustered data: Design-based, model-based, and single-level methods

Research output: Contribution to journalArticle

42 Citations (Scopus)

Abstract

Recent studies have investigated the small sample properties of models for clustered data, such as multilevel models and generalized estimating equations. These studies have focused on parameter bias when the number of clusters is small, but very few studies have addressed the methods' properties with sparse data: a small number of observations within each cluster. In particular, studies have yet to address the properties of generalized estimating equations, a possible alternative to multilevel models often overlooked in behavioral sciences, with sparse data. This article begins with a discussion of populationaveraged and cluster-specific models, provides a brief overview of both multilevel models and generalized estimating equations, and then conducts a simulation study on the sparse data properties of generalized estimating equations, multilevel models, and single-level regression models for both normal and binary outcomes. The simulation found generalized estimating equations estimate regression coefficients and their standard errors without bias with as few as 2 observations per cluster, provided that the number of clusters was reasonably large. Similar to the previous studies, multilevel models tended to overestimate the between-cluster variance components when the cluster size was below about 5.

Original languageEnglish (US)
Pages (from-to)552-563
Number of pages12
JournalPsychological Methods
Volume19
Issue number4
DOIs
StatePublished - Jan 1 2014
Externally publishedYes

Fingerprint

Behavioral Sciences

Keywords

  • Gee
  • HLM
  • Multilevel model
  • Small sample
  • Sparse data

ASJC Scopus subject areas

  • Psychology (miscellaneous)

Cite this

Modeling sparsely clustered data : Design-based, model-based, and single-level methods. / McNeish, Daniel.

In: Psychological Methods, Vol. 19, No. 4, 01.01.2014, p. 552-563.

Research output: Contribution to journalArticle

@article{43585e276af644d5b172f11e26c6449c,
title = "Modeling sparsely clustered data: Design-based, model-based, and single-level methods",
abstract = "Recent studies have investigated the small sample properties of models for clustered data, such as multilevel models and generalized estimating equations. These studies have focused on parameter bias when the number of clusters is small, but very few studies have addressed the methods' properties with sparse data: a small number of observations within each cluster. In particular, studies have yet to address the properties of generalized estimating equations, a possible alternative to multilevel models often overlooked in behavioral sciences, with sparse data. This article begins with a discussion of populationaveraged and cluster-specific models, provides a brief overview of both multilevel models and generalized estimating equations, and then conducts a simulation study on the sparse data properties of generalized estimating equations, multilevel models, and single-level regression models for both normal and binary outcomes. The simulation found generalized estimating equations estimate regression coefficients and their standard errors without bias with as few as 2 observations per cluster, provided that the number of clusters was reasonably large. Similar to the previous studies, multilevel models tended to overestimate the between-cluster variance components when the cluster size was below about 5.",
keywords = "Gee, HLM, Multilevel model, Small sample, Sparse data",
author = "Daniel McNeish",
year = "2014",
month = "1",
day = "1",
doi = "10.1037/met0000024",
language = "English (US)",
volume = "19",
pages = "552--563",
journal = "Psychological Methods",
issn = "1082-989X",
publisher = "American Psychological Association Inc.",
number = "4",

}

TY - JOUR

T1 - Modeling sparsely clustered data

T2 - Design-based, model-based, and single-level methods

AU - McNeish, Daniel

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Recent studies have investigated the small sample properties of models for clustered data, such as multilevel models and generalized estimating equations. These studies have focused on parameter bias when the number of clusters is small, but very few studies have addressed the methods' properties with sparse data: a small number of observations within each cluster. In particular, studies have yet to address the properties of generalized estimating equations, a possible alternative to multilevel models often overlooked in behavioral sciences, with sparse data. This article begins with a discussion of populationaveraged and cluster-specific models, provides a brief overview of both multilevel models and generalized estimating equations, and then conducts a simulation study on the sparse data properties of generalized estimating equations, multilevel models, and single-level regression models for both normal and binary outcomes. The simulation found generalized estimating equations estimate regression coefficients and their standard errors without bias with as few as 2 observations per cluster, provided that the number of clusters was reasonably large. Similar to the previous studies, multilevel models tended to overestimate the between-cluster variance components when the cluster size was below about 5.

AB - Recent studies have investigated the small sample properties of models for clustered data, such as multilevel models and generalized estimating equations. These studies have focused on parameter bias when the number of clusters is small, but very few studies have addressed the methods' properties with sparse data: a small number of observations within each cluster. In particular, studies have yet to address the properties of generalized estimating equations, a possible alternative to multilevel models often overlooked in behavioral sciences, with sparse data. This article begins with a discussion of populationaveraged and cluster-specific models, provides a brief overview of both multilevel models and generalized estimating equations, and then conducts a simulation study on the sparse data properties of generalized estimating equations, multilevel models, and single-level regression models for both normal and binary outcomes. The simulation found generalized estimating equations estimate regression coefficients and their standard errors without bias with as few as 2 observations per cluster, provided that the number of clusters was reasonably large. Similar to the previous studies, multilevel models tended to overestimate the between-cluster variance components when the cluster size was below about 5.

KW - Gee

KW - HLM

KW - Multilevel model

KW - Small sample

KW - Sparse data

UR - http://www.scopus.com/inward/record.url?scp=84925614904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925614904&partnerID=8YFLogxK

U2 - 10.1037/met0000024

DO - 10.1037/met0000024

M3 - Article

C2 - 25110903

AN - SCOPUS:84925614904

VL - 19

SP - 552

EP - 563

JO - Psychological Methods

JF - Psychological Methods

SN - 1082-989X

IS - 4

ER -