Abstract
Online cybercriminal communities exist in various geopolitical regions, including America, China, Russia, and more. Some multilingual forums exist where cybercriminals of differing geopolitical origin interact and exchange hacking knowledge and cybercriminal assets. Researchers can study such forums to better understand the global cybercriminal supply chain and cybercrime trends. However, little work has focused on identifying members of different language groups and geopolitical origin within such forums. One challenge is the necessity of a technique that scales across multiple languages. We are motivated to explore computational techniques that support automated and scalable categorization of cybercriminal forum participants into varying language groups. In particular, we make use of Paragraph Vectors, a state-of-The-Art neural network language model to generate fixed-length vector representations (i.e., document embeddings) of messages posted by forum participants. Results indicate Paragraph Vectors outperforms traditional n-gram frequency approaches for generating document embeddings that are useful for clustering cybercriminals into language groups.
Original language | English (US) |
---|---|
Title of host publication | IEEE International Conference on Intelligence and Security Informatics: Cybersecurity and Big Data, ISI 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 205-207 |
Number of pages | 3 |
ISBN (Electronic) | 9781509038657 |
DOIs | |
State | Published - Nov 15 2016 |
Event | 14th IEEE International Conference on Intelligence and Security Informatics, ISI 2015 - Tucson, United States Duration: Sep 28 2016 → Sep 30 2016 |
Other
Other | 14th IEEE International Conference on Intelligence and Security Informatics, ISI 2015 |
---|---|
Country/Territory | United States |
City | Tucson |
Period | 9/28/16 → 9/30/16 |
Keywords
- Cybecrminal community
- Cybersecurity
- Language modeling
- Multilingual
- Neural network
ASJC Scopus subject areas
- Information Systems
- Artificial Intelligence
- Computer Networks and Communications
- Information Systems and Management
- Safety, Risk, Reliability and Quality