Abstract

While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.

Original languageEnglish (US)
Title of host publicationSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages547-556
Number of pages10
ISBN (Print)9781450322591
DOIs
StatePublished - 2014
Event37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014 - Gold Coast, QLD, Australia
Duration: Jul 6 2014Jul 11 2014

Other

Other37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014
CountryAustralia
CityGold Coast, QLD
Period7/6/147/11/14

Fingerprint

Electronic mail
Linguistics
Labels
Communication
Chemical analysis
Experiments

Keywords

  • Cross-media mining
  • Emails
  • SMS
  • Social media
  • Spammer detection
  • Twitter
  • Web

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Cite this

Hu, X., Tang, J., & Liu, H. (2014). Leveraging knowledge across media for spammer detection in microblogging. In SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 547-556). Association for Computing Machinery. https://doi.org/10.1145/2600428.2609632

Leveraging knowledge across media for spammer detection in microblogging. / Hu, Xia; Tang, Jiliang; Liu, Huan.

SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 2014. p. 547-556.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Hu, X, Tang, J & Liu, H 2014, Leveraging knowledge across media for spammer detection in microblogging. in SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, pp. 547-556, 37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014, Gold Coast, QLD, Australia, 7/6/14. https://doi.org/10.1145/2600428.2609632
Hu X, Tang J, Liu H. Leveraging knowledge across media for spammer detection in microblogging. In SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery. 2014. p. 547-556 https://doi.org/10.1145/2600428.2609632
Hu, Xia ; Tang, Jiliang ; Liu, Huan. / Leveraging knowledge across media for spammer detection in microblogging. SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, 2014. pp. 547-556
@inproceedings{aa571d036f87476ebaeb53ffef28e6ce,
title = "Leveraging knowledge across media for spammer detection in microblogging",
abstract = "While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.",
keywords = "Cross-media mining, Emails, SMS, Social media, Spammer detection, Twitter, Web",
author = "Xia Hu and Jiliang Tang and Huan Liu",
year = "2014",
doi = "10.1145/2600428.2609632",
language = "English (US)",
isbn = "9781450322591",
pages = "547--556",
booktitle = "SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval",
publisher = "Association for Computing Machinery",

}

TY - GEN

T1 - Leveraging knowledge across media for spammer detection in microblogging

AU - Hu, Xia

AU - Tang, Jiliang

AU - Liu, Huan

PY - 2014

Y1 - 2014

N2 - While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.

AB - While microblogging has emerged as an important information sharing and communication platform, it has also become a convenient venue for spammers to overwhelm other users with unwanted content. Currently, spammer detection in microblogging focuses on using social networking information, but little on content analysis due to the distinct nature of microblogging messages. First, label information is hard to obtain. Second, the texts in microblogging are short and noisy. As we know, spammer detection has been extensively studied for years in various media, e.g., emails, SMS and the web. Motivated by abundant resources available in the other media, we investigate whether we can take advantage of the existing resources for spammer detection in microblogging. While people accept that texts in microblogging are different from those in other media, there is no quantitative analysis to show how different they are. In this paper, we first perform a comprehensive linguistic study to compare spam across different media. Inspired by the findings, we present an optimization formulation that enables the design of spammer detection in microblogging using knowledge from external media. We conduct experiments on real-world Twitter datasets to verify (1) whether email, SMS and web spam resources help and (2) how different media help for spammer detection in microblogging.

KW - Cross-media mining

KW - Emails

KW - SMS

KW - Social media

KW - Spammer detection

KW - Twitter

KW - Web

UR - http://www.scopus.com/inward/record.url?scp=84904552431&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84904552431&partnerID=8YFLogxK

U2 - 10.1145/2600428.2609632

DO - 10.1145/2600428.2609632

M3 - Conference contribution

SN - 9781450322591

SP - 547

EP - 556

BT - SIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

PB - Association for Computing Machinery

ER -