Working within a black box: Transparency in the collection and production of big twitter data

Kevin Driscoll, Shawn Walker

Research output: Contribution to journalArticle

89 Scopus citations


Twitter seems to provide a ready source of data for researchers interested in public opinion and popular communication. Indeed, tweets are routinely integrated into the visual presentation of news and scholarly publishing in the form of summary statistics, tables, and charts provided by commercial analytics software. Without a clear description of how the underlying data were collected, stored, cleaned, and analyzed, however, readers cannot assess their validity. To illustrate the critical importance of evaluating the production of Twitter data, we offer a systematic comparison of two common sources of tweets: the publicly accessible Streaming API and the "fire hose" provided by Gnip PowerTrack. This study represents an important step toward higher standards for the reporting of social media research.

Original languageEnglish (US)
Pages (from-to)1745-1764
Number of pages20
JournalInternational Journal of Communication
Issue number1
StatePublished - Jan 1 2014
Externally publishedYes



  • Big data
  • Data
  • Methodology
  • Social media

ASJC Scopus subject areas

  • Communication

Cite this