Less is more: Semi-supervised causal inference for detecting pathogenic users in social media

Hamidreza Alvari; Elham Shaabani; Soumajyoti Sarkar; Ghazaleh Beigi; Paulo Shakarian

doi:10.1145/3308560.3316500

Less is more: Semi-supervised causal inference for detecting pathogenic users in social media

Hamidreza Alvari, Elham Shaabani, Soumajyoti Sarkar, Ghazaleh Beigi, Paulo Shakarian

Computing and Augmented Intelligence, School of (IAFSE-SCAI)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

22 Scopus citations

Abstract

Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as �Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.

Original language	English (US)
Title of host publication	The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019
Publisher	Association for Computing Machinery, Inc
Pages	154-161
Number of pages	8
ISBN (Electronic)	9781450366755
DOIs	https://doi.org/10.1145/3308560.3316500
State	Published - May 13 2019
Event	2019 World Wide Web Conference, WWW 2019 - San Francisco, United States Duration: May 13 2019 → May 17 2019

Publication series

Name	The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019

Conference

Conference	2019 World Wide Web Conference, WWW 2019
Country/Territory	United States
City	San Francisco
Period	5/13/19 → 5/17/19

Keywords

Causal inference
Pathogenic users
Semi-supervised learning
Social media

ASJC Scopus subject areas

Computer Networks and Communications
Software

Access to Document

10.1145/3308560.3316500

Cite this

Alvari, H., Shaabani, E., Sarkar, S., Beigi, G., & Shakarian, P. (2019). Less is more: Semi-supervised causal inference for detecting pathogenic users in social media. In The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019 (pp. 154-161). (The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019). Association for Computing Machinery, Inc. https://doi.org/10.1145/3308560.3316500

Less is more: Semi-supervised causal inference for detecting pathogenic users in social media. / Alvari, Hamidreza; Shaabani, Elham; Sarkar, Soumajyoti et al.
The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, 2019. p. 154-161 (The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Alvari, H, Shaabani, E, Sarkar, S, Beigi, G & Shakarian, P 2019, Less is more: Semi-supervised causal inference for detecting pathogenic users in social media. in The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019. The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019, Association for Computing Machinery, Inc, pp. 154-161, 2019 World Wide Web Conference, WWW 2019, San Francisco, United States, 5/13/19. https://doi.org/10.1145/3308560.3316500

Alvari H, Shaabani E, Sarkar S, Beigi G, Shakarian P. Less is more: Semi-supervised causal inference for detecting pathogenic users in social media. In The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc. 2019. p. 154-161. (The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019). doi: 10.1145/3308560.3316500

Alvari, Hamidreza ; Shaabani, Elham ; Sarkar, Soumajyoti et al. / Less is more : Semi-supervised causal inference for detecting pathogenic users in social media. The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019. Association for Computing Machinery, Inc, 2019. pp. 154-161 (The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019).

@inproceedings{cd4e4f73589b46209a1824b92f4eb9d2,

title = "Less is more: Semi-supervised causal inference for detecting pathogenic users in social media",

abstract = "Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as �Pathogenic Social Media (PSM){"} accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.",

keywords = "Causal inference, Pathogenic users, Semi-supervised learning, Social media",

author = "Hamidreza Alvari and Elham Shaabani and Soumajyoti Sarkar and Ghazaleh Beigi and Paulo Shakarian",

note = "Publisher Copyright: � 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY-NC-ND 4.0 License.; 2019 World Wide Web Conference, WWW 2019 ; Conference date: 13-05-2019 Through 17-05-2019",

year = "2019",

month = may,

day = "13",

doi = "10.1145/3308560.3316500",

language = "English (US)",

series = "The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019",

publisher = "Association for Computing Machinery, Inc",

pages = "154--161",

booktitle = "The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019",

}

TY - GEN

T1 - Less is more

T2 - 2019 World Wide Web Conference, WWW 2019

AU - Alvari, Hamidreza

AU - Shaabani, Elham

AU - Sarkar, Soumajyoti

AU - Beigi, Ghazaleh

AU - Shakarian, Paulo

N1 - Publisher Copyright: � 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY-NC-ND 4.0 License.

PY - 2019/5/13

Y1 - 2019/5/13

N2 - Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as �Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.

AB - Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as �Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.

KW - Causal inference

KW - Pathogenic users

KW - Semi-supervised learning

KW - Social media

UR - http://www.scopus.com/inward/record.url?scp=85066887162&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066887162&partnerID=8YFLogxK

U2 - 10.1145/3308560.3316500

DO - 10.1145/3308560.3316500

M3 - Conference contribution

AN - SCOPUS:85066887162

T3 - The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019

SP - 154

EP - 161

BT - The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019

PB - Association for Computing Machinery, Inc

Y2 - 13 May 2019 through 17 May 2019

ER -

Less is more: Semi-supervised causal inference for detecting pathogenic users in social media

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this