Working within a black box

Transparency in the collection and production of big twitter data

Kevin Driscoll, Shawn Walker

Research output: Contribution to journalArticle

73 Citations (Scopus)

Abstract

Twitter seems to provide a ready source of data for researchers interested in public opinion and popular communication. Indeed, tweets are routinely integrated into the visual presentation of news and scholarly publishing in the form of summary statistics, tables, and charts provided by commercial analytics software. Without a clear description of how the underlying data were collected, stored, cleaned, and analyzed, however, readers cannot assess their validity. To illustrate the critical importance of evaluating the production of Twitter data, we offer a systematic comparison of two common sources of tweets: the publicly accessible Streaming API and the "fire hose" provided by Gnip PowerTrack. This study represents an important step toward higher standards for the reporting of social media research.

Original languageEnglish (US)
Pages (from-to)1745-1764
Number of pages20
JournalInternational Journal of Communication
Volume8
Issue number1
StatePublished - Jan 1 2014
Externally publishedYes

Fingerprint

Hose
twitter
Application programming interfaces (API)
Transparency
transparency
Fires
Statistics
Communication
communication research
social media
public opinion
news
statistics
communication
Big data

Keywords

  • Big data
  • Data
  • Methodology
  • Social media

ASJC Scopus subject areas

  • Communication

Cite this

Working within a black box : Transparency in the collection and production of big twitter data. / Driscoll, Kevin; Walker, Shawn.

In: International Journal of Communication, Vol. 8, No. 1, 01.01.2014, p. 1745-1764.

Research output: Contribution to journalArticle

@article{ecd34096ee5c4ca0871020334031ace0,
title = "Working within a black box: Transparency in the collection and production of big twitter data",
abstract = "Twitter seems to provide a ready source of data for researchers interested in public opinion and popular communication. Indeed, tweets are routinely integrated into the visual presentation of news and scholarly publishing in the form of summary statistics, tables, and charts provided by commercial analytics software. Without a clear description of how the underlying data were collected, stored, cleaned, and analyzed, however, readers cannot assess their validity. To illustrate the critical importance of evaluating the production of Twitter data, we offer a systematic comparison of two common sources of tweets: the publicly accessible Streaming API and the {"}fire hose{"} provided by Gnip PowerTrack. This study represents an important step toward higher standards for the reporting of social media research.",
keywords = "Big data, Data, Methodology, Social media",
author = "Kevin Driscoll and Shawn Walker",
year = "2014",
month = "1",
day = "1",
language = "English (US)",
volume = "8",
pages = "1745--1764",
journal = "International Journal of Communication",
issn = "1932-8036",
publisher = "USC Annenberg School for Communication & Journalism",
number = "1",

}

TY - JOUR

T1 - Working within a black box

T2 - Transparency in the collection and production of big twitter data

AU - Driscoll, Kevin

AU - Walker, Shawn

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Twitter seems to provide a ready source of data for researchers interested in public opinion and popular communication. Indeed, tweets are routinely integrated into the visual presentation of news and scholarly publishing in the form of summary statistics, tables, and charts provided by commercial analytics software. Without a clear description of how the underlying data were collected, stored, cleaned, and analyzed, however, readers cannot assess their validity. To illustrate the critical importance of evaluating the production of Twitter data, we offer a systematic comparison of two common sources of tweets: the publicly accessible Streaming API and the "fire hose" provided by Gnip PowerTrack. This study represents an important step toward higher standards for the reporting of social media research.

AB - Twitter seems to provide a ready source of data for researchers interested in public opinion and popular communication. Indeed, tweets are routinely integrated into the visual presentation of news and scholarly publishing in the form of summary statistics, tables, and charts provided by commercial analytics software. Without a clear description of how the underlying data were collected, stored, cleaned, and analyzed, however, readers cannot assess their validity. To illustrate the critical importance of evaluating the production of Twitter data, we offer a systematic comparison of two common sources of tweets: the publicly accessible Streaming API and the "fire hose" provided by Gnip PowerTrack. This study represents an important step toward higher standards for the reporting of social media research.

KW - Big data

KW - Data

KW - Methodology

KW - Social media

UR - http://www.scopus.com/inward/record.url?scp=85011592158&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011592158&partnerID=8YFLogxK

M3 - Article

VL - 8

SP - 1745

EP - 1764

JO - International Journal of Communication

JF - International Journal of Communication

SN - 1932-8036

IS - 1

ER -