Hot topic analysis and content mining in social media

Qian Yu, Wei Tao Weng, Kai Zhang, Kai Lei, Kuai Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand "what", "when", "who" on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.

Original languageEnglish (US)
Title of host publication2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781479975754
DOIs
StatePublished - Jan 20 2015
Event33rd IEEE International Performance Computing and Communications Conference, IPCCC 2014 - Austin, United States
Duration: Dec 5 2014Dec 7 2014

Publication series

Name2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014
Volume2014-January

Other

Other33rd IEEE International Performance Computing and Communications Conference, IPCCC 2014
Country/TerritoryUnited States
CityAustin
Period12/5/1412/7/14

ASJC Scopus subject areas

  • Software
  • Computational Theory and Mathematics
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Hot topic analysis and content mining in social media'. Together they form a unique fingerprint.

Cite this