The impact of sampling on big data analysis of social media: A case study on flu and ebola

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The explosive growth of online social networks in recent years have generated massive amount of data-sets in user behaviors, social graphs, and contents. Given the scale, heterogeneity, and diversity of such big data, sampling becomes a simple and intuitive approach to reduce the size of the data-sets for collecting, measuring, and understanding users, behaviors and traffic in online social networks. In this paper, we quantify the impact of random sampling on the analysis of online social networks with Twitter streaming data as a case study. In addition, we design different sampling strategies including community sampling and strata sampling, and evaluate their impact on a broad range of behavioral characteristics of online social networks. Our experimental results show that community sampling has the minimum impact on tweet distributions across users and the structure of retweeting graphs, while achieving the similar data reductions as random and stratified sampling.

Original languageEnglish (US)
Title of host publication2015 IEEE Global Communications Conference, GLOBECOM 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Print)9781479959525
DOIs
StatePublished - Feb 23 2016
Event58th IEEE Global Communications Conference, GLOBECOM 2015 - San Diego, United States
Duration: Dec 6 2015Dec 10 2015

Other

Other58th IEEE Global Communications Conference, GLOBECOM 2015
CountryUnited States
CitySan Diego
Period12/6/1512/10/15

Fingerprint

social media
data analysis
Sampling
social network
twitter
social stratum
social behavior
community
Big data
traffic
Data reduction

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Electrical and Electronic Engineering
  • Communication

Cite this

Xu, K., Wang, F., Jia, X., & Wang, H. (2016). The impact of sampling on big data analysis of social media: A case study on flu and ebola. In 2015 IEEE Global Communications Conference, GLOBECOM 2015 [7416974] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/GLOCOM.2014.7416974

The impact of sampling on big data analysis of social media : A case study on flu and ebola. / Xu, Kuai; Wang, Feng; Jia, Xiaohua; Wang, Haiyan.

2015 IEEE Global Communications Conference, GLOBECOM 2015. Institute of Electrical and Electronics Engineers Inc., 2016. 7416974.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xu, K, Wang, F, Jia, X & Wang, H 2016, The impact of sampling on big data analysis of social media: A case study on flu and ebola. in 2015 IEEE Global Communications Conference, GLOBECOM 2015., 7416974, Institute of Electrical and Electronics Engineers Inc., 58th IEEE Global Communications Conference, GLOBECOM 2015, San Diego, United States, 12/6/15. https://doi.org/10.1109/GLOCOM.2014.7416974
Xu K, Wang F, Jia X, Wang H. The impact of sampling on big data analysis of social media: A case study on flu and ebola. In 2015 IEEE Global Communications Conference, GLOBECOM 2015. Institute of Electrical and Electronics Engineers Inc. 2016. 7416974 https://doi.org/10.1109/GLOCOM.2014.7416974
Xu, Kuai ; Wang, Feng ; Jia, Xiaohua ; Wang, Haiyan. / The impact of sampling on big data analysis of social media : A case study on flu and ebola. 2015 IEEE Global Communications Conference, GLOBECOM 2015. Institute of Electrical and Electronics Engineers Inc., 2016.
@inproceedings{9510460cb4394dba8b08f37fec7a682d,
title = "The impact of sampling on big data analysis of social media: A case study on flu and ebola",
abstract = "The explosive growth of online social networks in recent years have generated massive amount of data-sets in user behaviors, social graphs, and contents. Given the scale, heterogeneity, and diversity of such big data, sampling becomes a simple and intuitive approach to reduce the size of the data-sets for collecting, measuring, and understanding users, behaviors and traffic in online social networks. In this paper, we quantify the impact of random sampling on the analysis of online social networks with Twitter streaming data as a case study. In addition, we design different sampling strategies including community sampling and strata sampling, and evaluate their impact on a broad range of behavioral characteristics of online social networks. Our experimental results show that community sampling has the minimum impact on tweet distributions across users and the structure of retweeting graphs, while achieving the similar data reductions as random and stratified sampling.",
author = "Kuai Xu and Feng Wang and Xiaohua Jia and Haiyan Wang",
year = "2016",
month = "2",
day = "23",
doi = "10.1109/GLOCOM.2014.7416974",
language = "English (US)",
isbn = "9781479959525",
booktitle = "2015 IEEE Global Communications Conference, GLOBECOM 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - The impact of sampling on big data analysis of social media

T2 - A case study on flu and ebola

AU - Xu, Kuai

AU - Wang, Feng

AU - Jia, Xiaohua

AU - Wang, Haiyan

PY - 2016/2/23

Y1 - 2016/2/23

N2 - The explosive growth of online social networks in recent years have generated massive amount of data-sets in user behaviors, social graphs, and contents. Given the scale, heterogeneity, and diversity of such big data, sampling becomes a simple and intuitive approach to reduce the size of the data-sets for collecting, measuring, and understanding users, behaviors and traffic in online social networks. In this paper, we quantify the impact of random sampling on the analysis of online social networks with Twitter streaming data as a case study. In addition, we design different sampling strategies including community sampling and strata sampling, and evaluate their impact on a broad range of behavioral characteristics of online social networks. Our experimental results show that community sampling has the minimum impact on tweet distributions across users and the structure of retweeting graphs, while achieving the similar data reductions as random and stratified sampling.

AB - The explosive growth of online social networks in recent years have generated massive amount of data-sets in user behaviors, social graphs, and contents. Given the scale, heterogeneity, and diversity of such big data, sampling becomes a simple and intuitive approach to reduce the size of the data-sets for collecting, measuring, and understanding users, behaviors and traffic in online social networks. In this paper, we quantify the impact of random sampling on the analysis of online social networks with Twitter streaming data as a case study. In addition, we design different sampling strategies including community sampling and strata sampling, and evaluate their impact on a broad range of behavioral characteristics of online social networks. Our experimental results show that community sampling has the minimum impact on tweet distributions across users and the structure of retweeting graphs, while achieving the similar data reductions as random and stratified sampling.

UR - http://www.scopus.com/inward/record.url?scp=84964861002&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964861002&partnerID=8YFLogxK

U2 - 10.1109/GLOCOM.2014.7416974

DO - 10.1109/GLOCOM.2014.7416974

M3 - Conference contribution

AN - SCOPUS:84964861002

SN - 9781479959525

BT - 2015 IEEE Global Communications Conference, GLOBECOM 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -