Hot topic analysis and content mining in social media

Qian Yu, Wei Tao Weng, Kai Zhang, Kai Lei, Kuai Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

  • 3 Citations

Abstract

Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand "what", "when", "who" on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.

LanguageEnglish (US)
Title of host publication2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Volume2014-January
ISBN (Electronic)9781479975754
DOIs
StatePublished - Jan 20 2015
Event33rd IEEE International Performance Computing and Communications Conference, IPCCC 2014 - Austin, United States
Duration: Dec 5 2014Dec 7 2014

Other

Other33rd IEEE International Performance Computing and Communications Conference, IPCCC 2014
CountryUnited States
CityAustin
Period12/5/1412/7/14

Fingerprint

Search engines
Marketing
Statistics

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics
  • Computer Networks and Communications

Cite this

Yu, Q., Weng, W. T., Zhang, K., Lei, K., & Xu, K. (2015). Hot topic analysis and content mining in social media. In 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014 (Vol. 2014-January). [7017056] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PCCC.2014.7017056

Hot topic analysis and content mining in social media. / Yu, Qian; Weng, Wei Tao; Zhang, Kai; Lei, Kai; Xu, Kuai.

2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014. Vol. 2014-January Institute of Electrical and Electronics Engineers Inc., 2015. 7017056.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, Q, Weng, WT, Zhang, K, Lei, K & Xu, K 2015, Hot topic analysis and content mining in social media. in 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014. vol. 2014-January, 7017056, Institute of Electrical and Electronics Engineers Inc., 33rd IEEE International Performance Computing and Communications Conference, IPCCC 2014, Austin, United States, 12/5/14. https://doi.org/10.1109/PCCC.2014.7017056
Yu Q, Weng WT, Zhang K, Lei K, Xu K. Hot topic analysis and content mining in social media. In 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014. Vol. 2014-January. Institute of Electrical and Electronics Engineers Inc. 2015. 7017056 https://doi.org/10.1109/PCCC.2014.7017056
Yu, Qian ; Weng, Wei Tao ; Zhang, Kai ; Lei, Kai ; Xu, Kuai. / Hot topic analysis and content mining in social media. 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014. Vol. 2014-January Institute of Electrical and Electronics Engineers Inc., 2015.
@inproceedings{798b837ca8424d9c97abb2f3a33aa474,
title = "Hot topic analysis and content mining in social media",
abstract = "Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand {"}what{"}, {"}when{"}, {"}who{"} on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.",
author = "Qian Yu and Weng, {Wei Tao} and Kai Zhang and Kai Lei and Kuai Xu",
year = "2015",
month = "1",
day = "20",
doi = "10.1109/PCCC.2014.7017056",
language = "English (US)",
volume = "2014-January",
booktitle = "2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - Hot topic analysis and content mining in social media

AU - Yu, Qian

AU - Weng, Wei Tao

AU - Zhang, Kai

AU - Lei, Kai

AU - Xu, Kuai

PY - 2015/1/20

Y1 - 2015/1/20

N2 - Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand "what", "when", "who" on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.

AB - Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand "what", "when", "who" on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.

UR - http://www.scopus.com/inward/record.url?scp=84983135048&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84983135048&partnerID=8YFLogxK

U2 - 10.1109/PCCC.2014.7017056

DO - 10.1109/PCCC.2014.7017056

M3 - Conference contribution

VL - 2014-January

BT - 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -