Abstract

When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to “viral” proportions—where “viral” can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power law—which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on “structural diversity”—the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class—despite this class comprising under 2 % of samples. This significantly outperforms our baseline approach as well as the current state of the art. We also show this approach also performs well for identifying whether cascades observed for 60 min will grow to 500 reposts as well as demonstrate how we can trade-off between precision and recall.

Original languageEnglish (US)
Article number64
JournalSocial Network Analysis and Mining
Volume6
Issue number1
DOIs
StatePublished - Dec 1 2016

Keywords

  • Cascade prediction
  • Diffusion in social networks
  • Information diffusion
  • Social network analysis

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction
  • Information Systems
  • Communication
  • Media Technology

Fingerprint Dive into the research topics of 'Toward early and order-of-magnitude cascade prediction in social networks'. Together they form a unique fingerprint.

Cite this