TY - GEN
T1 - The impact of sampling on big data analysis of social media
T2 - 58th IEEE Global Communications Conference, GLOBECOM 2015
AU - Xu, Kuai
AU - Wang, Feng
AU - Jia, Xiaohua
AU - Wang, Haiyan
N1 - Funding Information:
This work was supported in part by National Science Foundation grant CNS #1218212 and by RGC of Hong Kong Project No. CityU 114713
Publisher Copyright:
© 2015 IEEE.
PY - 2015
Y1 - 2015
N2 - The explosive growth of online social networks in recent years have generated massive amount of data-sets in user behaviors, social graphs, and contents. Given the scale, heterogeneity, and diversity of such big data, sampling becomes a simple and intuitive approach to reduce the size of the data-sets for collecting, measuring, and understanding users, behaviors and traffic in online social networks. In this paper, we quantify the impact of random sampling on the analysis of online social networks with Twitter streaming data as a case study. In addition, we design different sampling strategies including community sampling and strata sampling, and evaluate their impact on a broad range of behavioral characteristics of online social networks. Our experimental results show that community sampling has the minimum impact on tweet distributions across users and the structure of retweeting graphs, while achieving the similar data reductions as random and stratified sampling.
AB - The explosive growth of online social networks in recent years have generated massive amount of data-sets in user behaviors, social graphs, and contents. Given the scale, heterogeneity, and diversity of such big data, sampling becomes a simple and intuitive approach to reduce the size of the data-sets for collecting, measuring, and understanding users, behaviors and traffic in online social networks. In this paper, we quantify the impact of random sampling on the analysis of online social networks with Twitter streaming data as a case study. In addition, we design different sampling strategies including community sampling and strata sampling, and evaluate their impact on a broad range of behavioral characteristics of online social networks. Our experimental results show that community sampling has the minimum impact on tweet distributions across users and the structure of retweeting graphs, while achieving the similar data reductions as random and stratified sampling.
UR - http://www.scopus.com/inward/record.url?scp=84964861002&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964861002&partnerID=8YFLogxK
U2 - 10.1109/GLOCOM.2014.7416974
DO - 10.1109/GLOCOM.2014.7416974
M3 - Conference contribution
AN - SCOPUS:84964861002
T3 - 2015 IEEE Global Communications Conference, GLOBECOM 2015
BT - 2015 IEEE Global Communications Conference, GLOBECOM 2015
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 6 December 2015 through 10 December 2015
ER -