Toward early and order-of-magnitude cascade prediction in social networks

Ruocheng Guo; Elham Shaabani; Abhinav Bhatnagar; Paulo Shakarian

doi:10.1007/s13278-016-0372-7

Toward early and order-of-magnitude cascade prediction in social networks

Ruocheng Guo, Elham Shaabani, Abhinav Bhatnagar, Paulo Shakarian

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to “viral” proportions—where “viral” can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power law—which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on “structural diversity”—the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class—despite this class comprising under 2 % of samples. This significantly outperforms our baseline approach as well as the current state of the art. We also show this approach also performs well for identifying whether cascades observed for 60 min will grow to 500 reposts as well as demonstrate how we can trade-off between precision and recall.

Original language	English (US)
Article number	64
Journal	Social Network Analysis and Mining
Volume	6
Issue number	1
DOIs	https://doi.org/10.1007/s13278-016-0372-7
State	Published - Dec 1 2016

Keywords

Cascade prediction
Diffusion in social networks
Information diffusion
Social network analysis

ASJC Scopus subject areas

Computer Science Applications
Human-Computer Interaction
Information Systems
Communication
Media Technology

Access to Document

10.1007/s13278-016-0372-7

Cite this

@article{7a1f054bc4c14dcba27ca95aeba152d6,

title = "Toward early and order-of-magnitude cascade prediction in social networks",

abstract = "When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to “viral” proportions—where “viral” can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power law—which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on “structural diversity”—the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class—despite this class comprising under 2 % of samples. This significantly outperforms our baseline approach as well as the current state of the art. We also show this approach also performs well for identifying whether cascades observed for 60 min will grow to 500 reposts as well as demonstrate how we can trade-off between precision and recall.",

keywords = "Cascade prediction, Diffusion in social networks, Information diffusion, Social network analysis",

author = "Ruocheng Guo and Elham Shaabani and Abhinav Bhatnagar and Paulo Shakarian",

year = "2016",

month = dec,

day = "1",

doi = "10.1007/s13278-016-0372-7",

language = "English (US)",

volume = "6",

journal = "Social Network Analysis and Mining",

issn = "1869-5450",

publisher = "Springer Wien",

number = "1",

}

TY - JOUR

T1 - Toward early and order-of-magnitude cascade prediction in social networks

AU - Guo, Ruocheng

AU - Shaabani, Elham

AU - Bhatnagar, Abhinav

AU - Shakarian, Paulo

PY - 2016/12/1

Y1 - 2016/12/1

N2 - When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to “viral” proportions—where “viral” can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power law—which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on “structural diversity”—the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class—despite this class comprising under 2 % of samples. This significantly outperforms our baseline approach as well as the current state of the art. We also show this approach also performs well for identifying whether cascades observed for 60 min will grow to 500 reposts as well as demonstrate how we can trade-off between precision and recall.

AB - When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to “viral” proportions—where “viral” can be defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power law—which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on “structural diversity”—the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class—despite this class comprising under 2 % of samples. This significantly outperforms our baseline approach as well as the current state of the art. We also show this approach also performs well for identifying whether cascades observed for 60 min will grow to 500 reposts as well as demonstrate how we can trade-off between precision and recall.

KW - Cascade prediction

KW - Diffusion in social networks

KW - Information diffusion

KW - Social network analysis

UR - http://www.scopus.com/inward/record.url?scp=84984622260&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984622260&partnerID=8YFLogxK

U2 - 10.1007/s13278-016-0372-7

DO - 10.1007/s13278-016-0372-7

M3 - Article

AN - SCOPUS:84984622260

SN - 1869-5450

VL - 6

JO - Social Network Analysis and Mining

JF - Social Network Analysis and Mining

IS - 1

M1 - 64

ER -

Toward early and order-of-magnitude cascade prediction in social networks

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this