TY - GEN
T1 - Less is more
T2 - 2019 World Wide Web Conference, WWW 2019
AU - Alvari, Hamidreza
AU - Shaabani, Elham
AU - Sarkar, Soumajyoti
AU - Beigi, Ghazaleh
AU - Shakarian, Paulo
N1 - Publisher Copyright:
� 2019 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY-NC-ND 4.0 License.
PY - 2019/5/13
Y1 - 2019/5/13
N2 - Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as �Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.
AB - Recent years have witnessed a surge of manipulation of public opinion and political events by malicious social media actors. These users are referred to as �Pathogenic Social Media (PSM)" accounts. PSMs are key users in spreading misinformation in social media to viral proportions. These accounts can be either controlled by real users or automated bots. Identification of PSMs is thus of utmost importance for social media authorities. The burden usually falls to automatic approaches that can identify these accounts and protect social media reputation. However, lack of sufficient labeled examples for devising and training sophisticated approaches to combat these accounts is still one of the foremost challenges facing social media firms. In contrast, unlabeled data is abundant and cheap to obtain thanks to massive user-generated data. In this paper, we propose a semi-supervised causal inference PSM detection framework, SemiPsm, to compensate for the lack of labeled data. In particular, the proposed method leverages unlabeled data in the form of manifold regularization and only relies on cascade information. This is in contrast to the existing approaches that use exhaustive feature engineering (e.g., profile information, network structure, etc.). Evidence from empirical experiments on a real-world ISIS-related dataset from Twitter suggests promising results of utilizing unlabeled instances for detecting PSMs.
KW - Causal inference
KW - Pathogenic users
KW - Semi-supervised learning
KW - Social media
UR - http://www.scopus.com/inward/record.url?scp=85066887162&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85066887162&partnerID=8YFLogxK
U2 - 10.1145/3308560.3316500
DO - 10.1145/3308560.3316500
M3 - Conference contribution
AN - SCOPUS:85066887162
T3 - The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019
SP - 154
EP - 161
BT - The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019
PB - Association for Computing Machinery, Inc
Y2 - 13 May 2019 through 17 May 2019
ER -