Leveraging knowledge across media for spammer detection in microblogging

Xia Hu, Jiliang Tang, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

25 Scopus citations

Abstract

While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.

Original languageEnglish (US)
Title of host publicationSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages547-556
Number of pages10
ISBN (Print)9781450322591
DOIs
StatePublished - Jan 1 2014
Event37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014 - Gold Coast, QLD, Australia
Duration: Jul 6 2014Jul 11 2014

Publication series

NameSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014
CountryAustralia
CityGold Coast, QLD
Period7/6/147/11/14

    Fingerprint

Keywords

  • Cross-media mining
  • Emails
  • SMS
  • Social media
  • Spammer detection
  • Twitter
  • Web

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Hu, X., Tang, J., & Liu, H. (2014). Leveraging knowledge across media for spammer detection in microblogging. In SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 547-556). (SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval). Association for Computing Machinery. https://doi.org/10.1145/2600428.2609632