Modeling sparsely clustered data: Design-based, model-based, and single-level methods

Research output: Contribution to journalArticlepeer-review

76 Scopus citations

Abstract

Recent studies have investigated the small sample properties of models for clustered data, such as multilevel models and generalized estimating equations. These studies have focused on parameter bias when the number of clusters is small, but very few studies have addressed the methods' properties with sparse data: a small number of observations within each cluster. In particular, studies have yet to address the properties of generalized estimating equations, a possible alternative to multilevel models often overlooked in behavioral sciences, with sparse data. This article begins with a discussion of populationaveraged and cluster-specific models, provides a brief overview of both multilevel models and generalized estimating equations, and then conducts a simulation study on the sparse data properties of generalized estimating equations, multilevel models, and single-level regression models for both normal and binary outcomes. The simulation found generalized estimating equations estimate regression coefficients and their standard errors without bias with as few as 2 observations per cluster, provided that the number of clusters was reasonably large. Similar to the previous studies, multilevel models tended to overestimate the between-cluster variance components when the cluster size was below about 5.

Original languageEnglish (US)
Pages (from-to)552-563
Number of pages12
JournalPsychological Methods
Volume19
Issue number4
DOIs
StatePublished - Dec 1 2014
Externally publishedYes

Keywords

  • Gee
  • HLM
  • Multilevel model
  • Small sample
  • Sparse data

ASJC Scopus subject areas

  • Psychology (miscellaneous)

Fingerprint

Dive into the research topics of 'Modeling sparsely clustered data: Design-based, model-based, and single-level methods'. Together they form a unique fingerprint.

Cite this