When is it biased? Assessing the representativeness of twitter's streaming API

Fred Morstatter; Jürgen Pfeffer; Huan Liu

doi:10.1145/2567948.2576952

When is it biased? Assessing the representativeness of twitter's streaming API

Fred Morstatter, Jürgen Pfeffer, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

89 Scopus citations

Abstract

Twitter shares a free 1% sample of its tweets through the Streaming API". Recently, research has pointed to evidence of bias in this source. The methodologies proposed in previous work rely on the restrictive and expensive Firehose to find the bias in the Streaming API data. We tackle the problem of finding sample bias without costly and restrictive Firehose data. We propose a solution that focuses on using an open data source to find bias in the Streaming API.

Original language	English (US)
Title of host publication	WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web
Publisher	Association for Computing Machinery, Inc
Pages	555-556
Number of pages	2
ISBN (Electronic)	9781450327459
DOIs	https://doi.org/10.1145/2567948.2576952
State	Published - Apr 7 2014
Event	23rd International Conference on World Wide Web, WWW 2014 - Seoul, Korea, Republic of Duration: Apr 7 2014 → Apr 11 2014

Publication series

Name	WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web

Other

Other	23rd International Conference on World Wide Web, WWW 2014
Country/Territory	Korea, Republic of
City	Seoul
Period	4/7/14 → 4/11/14

Keywords

Big data
Data sampling
Sampling bias
Twitter analysis

ASJC Scopus subject areas

Computer Networks and Communications
Software

Access to Document

10.1145/2567948.2576952

Cite this

Morstatter, F., Pfeffer, J., & Liu, H. (2014). When is it biased? Assessing the representativeness of twitter's streaming API. In WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web (pp. 555-556). (WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web). Association for Computing Machinery, Inc. https://doi.org/10.1145/2567948.2576952

When is it biased? Assessing the representativeness of twitter's streaming API. / Morstatter, Fred; Pfeffer, Jürgen; Liu, Huan.
WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, Inc, 2014. p. 555-556 (WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Morstatter, F, Pfeffer, J & Liu, H 2014, When is it biased? Assessing the representativeness of twitter's streaming API. in WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web, Association for Computing Machinery, Inc, pp. 555-556, 23rd International Conference on World Wide Web, WWW 2014, Seoul, Korea, Republic of, 4/7/14. https://doi.org/10.1145/2567948.2576952

Morstatter F, Pfeffer J, Liu H. When is it biased? Assessing the representativeness of twitter's streaming API. In WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, Inc. 2014. p. 555-556. (WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web). doi: 10.1145/2567948.2576952

Morstatter, Fred ; Pfeffer, Jürgen ; Liu, Huan. / When is it biased? Assessing the representativeness of twitter's streaming API. WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web. Association for Computing Machinery, Inc, 2014. pp. 555-556 (WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web).

@inproceedings{b757d13d792e4e4f985531ff23633649,

title = "When is it biased? Assessing the representativeness of twitter's streaming API",

abstract = "Twitter shares a free 1% sample of its tweets through the Streaming API{"}. Recently, research has pointed to evidence of bias in this source. The methodologies proposed in previous work rely on the restrictive and expensive Firehose to find the bias in the Streaming API data. We tackle the problem of finding sample bias without costly and restrictive Firehose data. We propose a solution that focuses on using an open data source to find bias in the Streaming API.",

keywords = "Big data, Data sampling, Sampling bias, Twitter analysis",

author = "Fred Morstatter and J{\"u}rgen Pfeffer and Huan Liu",

note = "Funding Information: This work is sponsored, in part, by Office of Naval Research grants N000141010091 and N000141110527. The full version of this paper can be found here: http://arxiv.org/ abs/1401.7909. Publisher Copyright: {\textcopyright} Copyright 2014 by the International World Wide Web Conferences Steering Committee.; 23rd International Conference on World Wide Web, WWW 2014 ; Conference date: 07-04-2014 Through 11-04-2014",

year = "2014",

month = apr,

day = "7",

doi = "10.1145/2567948.2576952",

language = "English (US)",

series = "WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web",

publisher = "Association for Computing Machinery, Inc",

pages = "555--556",

booktitle = "WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web",

}

TY - GEN

T1 - When is it biased? Assessing the representativeness of twitter's streaming API

AU - Morstatter, Fred

AU - Pfeffer, Jürgen

AU - Liu, Huan

N1 - Funding Information: This work is sponsored, in part, by Office of Naval Research grants N000141010091 and N000141110527. The full version of this paper can be found here: http://arxiv.org/ abs/1401.7909. Publisher Copyright: © Copyright 2014 by the International World Wide Web Conferences Steering Committee.

PY - 2014/4/7

Y1 - 2014/4/7

N2 - Twitter shares a free 1% sample of its tweets through the Streaming API". Recently, research has pointed to evidence of bias in this source. The methodologies proposed in previous work rely on the restrictive and expensive Firehose to find the bias in the Streaming API data. We tackle the problem of finding sample bias without costly and restrictive Firehose data. We propose a solution that focuses on using an open data source to find bias in the Streaming API.

AB - Twitter shares a free 1% sample of its tweets through the Streaming API". Recently, research has pointed to evidence of bias in this source. The methodologies proposed in previous work rely on the restrictive and expensive Firehose to find the bias in the Streaming API data. We tackle the problem of finding sample bias without costly and restrictive Firehose data. We propose a solution that focuses on using an open data source to find bias in the Streaming API.

KW - Big data

KW - Data sampling

KW - Sampling bias

KW - Twitter analysis

UR - http://www.scopus.com/inward/record.url?scp=84990955096&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990955096&partnerID=8YFLogxK

U2 - 10.1145/2567948.2576952

DO - 10.1145/2567948.2576952

M3 - Conference contribution

AN - SCOPUS:84990955096

T3 - WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web

SP - 555

EP - 556

BT - WWW 2014 Companion - Proceedings of the 23rd International Conference on World Wide Web

PB - Association for Computing Machinery, Inc

T2 - 23rd International Conference on World Wide Web, WWW 2014

Y2 - 7 April 2014 through 11 April 2014

ER -

When is it biased? Assessing the representativeness of twitter's streaming API

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this