TY - GEN
T1 - Toward order-of-magnitude cascade prediction
AU - Guo, Ruocheng
AU - Shaabani, Elham
AU - Bhatnagar, Abhinav
AU - Shakarian, Paulo
N1 - Funding Information:
IV. ACKNOWLEDGMENT This work is supported through the AFOSR Young Investigator Program (YIP), grant number FA9550-15-1-0159.
PY - 2015/8/25
Y1 - 2015/8/25
N2 - When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to "viral" proportions - where "viral" is defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on "structural diversity" - the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class - despite this class comprising under 2% of samples. This significantly outperforms our baseline approach as well as the current state-of-the-art. Our work also demonstrates how we can tradeoff between precision and recall.
AB - When a piece of information (microblog, photograph, video, link, etc.) starts to spread in a social network, an important question arises: will it spread to "viral" proportions - where "viral" is defined as an order-of-magnitude increase. However, several previous studies have established that cascade size and frequency are related through a power-law - which leads to a severe imbalance in this classification problem. In this paper, we devise a suite of measurements based on "structural diversity" - the variety of social contexts (communities) in which individuals partaking in a given cascade engage. We demonstrate these measures are able to distinguish viral from non-viral cascades, despite the severe imbalance of the data for this problem. Further, we leverage these measurements as features in a classification approach, successfully predicting microblogs that grow from 50 to 500 reposts with precision of 0.69 and recall of 0.52 for the viral class - despite this class comprising under 2% of samples. This significantly outperforms our baseline approach as well as the current state-of-the-art. Our work also demonstrates how we can tradeoff between precision and recall.
UR - http://www.scopus.com/inward/record.url?scp=84962582467&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84962582467&partnerID=8YFLogxK
U2 - 10.1145/2808797.2809358
DO - 10.1145/2808797.2809358
M3 - Conference contribution
AN - SCOPUS:84962582467
T3 - Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015
SP - 1610
EP - 1613
BT - Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015
A2 - Pei, Jian
A2 - Tang, Jie
A2 - Silvestri, Fabrizio
PB - Association for Computing Machinery, Inc
T2 - IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2015
Y2 - 25 August 2015 through 28 August 2015
ER -