Hot topic analysis and content mining in social media

Qian Yu, Wei Tao Weng, Kai Zhang, Kai Lei, Kuai Xu

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    4 Citations (Scopus)

    Abstract

    Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand "what", "when", "who" on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.

    Original languageEnglish (US)
    Title of host publication2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Volume2014-January
    ISBN (Electronic)9781479975754
    DOIs
    StatePublished - Jan 20 2015
    Event33rd IEEE International Performance Computing and Communications Conference, IPCCC 2014 - Austin, United States
    Duration: Dec 5 2014Dec 7 2014

    Other

    Other33rd IEEE International Performance Computing and Communications Conference, IPCCC 2014
    CountryUnited States
    CityAustin
    Period12/5/1412/7/14

    Fingerprint

    Search engines
    Marketing
    Statistics

    ASJC Scopus subject areas

    • Software
    • Computational Theory and Mathematics
    • Computer Networks and Communications

    Cite this

    Yu, Q., Weng, W. T., Zhang, K., Lei, K., & Xu, K. (2015). Hot topic analysis and content mining in social media. In 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014 (Vol. 2014-January). [7017056] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/PCCC.2014.7017056

    Hot topic analysis and content mining in social media. / Yu, Qian; Weng, Wei Tao; Zhang, Kai; Lei, Kai; Xu, Kuai.

    2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014. Vol. 2014-January Institute of Electrical and Electronics Engineers Inc., 2015. 7017056.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Yu, Q, Weng, WT, Zhang, K, Lei, K & Xu, K 2015, Hot topic analysis and content mining in social media. in 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014. vol. 2014-January, 7017056, Institute of Electrical and Electronics Engineers Inc., 33rd IEEE International Performance Computing and Communications Conference, IPCCC 2014, Austin, United States, 12/5/14. https://doi.org/10.1109/PCCC.2014.7017056
    Yu Q, Weng WT, Zhang K, Lei K, Xu K. Hot topic analysis and content mining in social media. In 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014. Vol. 2014-January. Institute of Electrical and Electronics Engineers Inc. 2015. 7017056 https://doi.org/10.1109/PCCC.2014.7017056
    Yu, Qian ; Weng, Wei Tao ; Zhang, Kai ; Lei, Kai ; Xu, Kuai. / Hot topic analysis and content mining in social media. 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014. Vol. 2014-January Institute of Electrical and Electronics Engineers Inc., 2015.
    @inproceedings{798b837ca8424d9c97abb2f3a33aa474,
    title = "Hot topic analysis and content mining in social media",
    abstract = "Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand {"}what{"}, {"}when{"}, {"}who{"} on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.",
    author = "Qian Yu and Weng, {Wei Tao} and Kai Zhang and Kai Lei and Kuai Xu",
    year = "2015",
    month = "1",
    day = "20",
    doi = "10.1109/PCCC.2014.7017056",
    language = "English (US)",
    volume = "2014-January",
    booktitle = "2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014",
    publisher = "Institute of Electrical and Electronics Engineers Inc.",
    address = "United States",

    }

    TY - GEN

    T1 - Hot topic analysis and content mining in social media

    AU - Yu, Qian

    AU - Weng, Wei Tao

    AU - Zhang, Kai

    AU - Lei, Kai

    AU - Xu, Kuai

    PY - 2015/1/20

    Y1 - 2015/1/20

    N2 - Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand "what", "when", "who" on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.

    AB - Sina Weibo has become an increasingly critical social media in China for sharing latest news, marketing new products, and discussing controversial issues. The rising importance of Sina Weibo on the society makes it very important to understand "what", "when", "who" on hot topics that are being continuously tweeted and searched by millions of active users. In this paper, we develop a systematic approach to characterize temporal distribution of hot topics searched by Sina Weibo users over a four-month time-span and to uncover correlated hot topics that are not only tweeted by the same users, but also appear in the similar set of tweet messages. We analyze real-time Sina Weibo tweet data streams and study volume correlations and temporal gaps between user searches and tweeting activities on hot topics. In addition, we examine the correlations between hot topic searches on social media and on search engines to understand hot topics and user behaviors across different platforms. Given the challenges of analyzing massive amount of tweet data, we explore Hadoop MapReduce framework to effectively process millions of tweets from the collected data-sets, and quantify the performance benefits of MapReduce on analyzing tweet streams. To the best of our knowledge, this paper is the first effort to characterize temporal search patterns of hot topics on Sina Weibo and to study their correlations with tweeting data streams as well as search engine statistics.

    UR - http://www.scopus.com/inward/record.url?scp=84983135048&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84983135048&partnerID=8YFLogxK

    U2 - 10.1109/PCCC.2014.7017056

    DO - 10.1109/PCCC.2014.7017056

    M3 - Conference contribution

    VL - 2014-January

    BT - 2014 IEEE 33rd International Performance Computing and Communications Conference, IPCCC 2014

    PB - Institute of Electrical and Electronics Engineers Inc.

    ER -