Can One Tamper with the Sample API?

Fred Morstatter, Harsh Dani, Justin Sampson, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

While social media mining continues to be an active area of research, obtaining data for research is a perennial problem. Even more, obtaining unbiased data is a challenge for researchers who wish to study human behavior, and not technical artifacts induced by the sampling algorithm of a social media site. In this work, we evaluate one social media data outlet that gives data to its users in the form of a stream: Twitter's Sample API. We show that in its current form, this API can be poisoned by bots or spammers who wish to promote their content, jeopardizing the credibility of the data collected through this API. We design a proof-of-concept algorithm that shows how malicious users could increase the probability of their content appearing in the Sample API, thus biasing the content towards spam and bot content and harming the representativity of this data outlet.

Original languageEnglish (US)
Title of host publicationWWW 2016 Companion - Proceedings of the 25th International Conference on World Wide Web
PublisherAssociation for Computing Machinery, Inc
Pages81-82
Number of pages2
ISBN (Electronic)9781450341448
DOIs
StatePublished - Apr 11 2016
Externally publishedYes
Event25th International Conference on World Wide Web, WWW 2016 - Montreal, Canada
Duration: May 11 2016May 15 2016

Publication series

NameWWW 2016 Companion - Proceedings of the 25th International Conference on World Wide Web

Conference

Conference25th International Conference on World Wide Web, WWW 2016
Country/TerritoryCanada
CityMontreal
Period5/11/165/15/16

Keywords

  • data mining
  • data sampling
  • sample bias

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Can One Tamper with the Sample API?'. Together they form a unique fingerprint.

Cite this