TY - GEN
T1 - DarkEmbed
T2 - 30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018
AU - Tavabi, Nazgol
AU - Goyal, Palash
AU - Almukaynizi, Mohammed
AU - Shakarian, Paulo
AU - Lerman, Kristina
N1 - Funding Information:
This work was supported by the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA) via the Air Force Research Laboratory (AFRL) contract number FA8750-16-C- 0112. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, AFRL, or the U.S. Government.
Publisher Copyright:
© 2018 Proceedings of the 30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Software vulnerabilities can expose computer systems to attacks by malicious actors. With the number of vulnerabilities discovered in the recent years surging, creating timely patches for every vulnerability is not always feasible. At the same time, not every vulnerability will be exploited by attackers; hence, prioritizing vulnerabilities by assessing the likelihood they will be exploited has become an important research problem. Recent works used machine learning techniques to predict exploited vulnerabilities by analyzing discussions about vulnerabilities on social media. These methods relied on traditional text processing techniques, which represent statistical features of words, but fail to capture their context. To address this challenge, we propose DarkEmbed, a neural language modeling approach that learns low dimensional distributed representations, i.e., embeddings, of darkweb/deepweb discussions to predict whether vulnerabilities will be exploited. By capturing linguistic regularities of human language, such as syntactic, semantic similarity and logic analogy, the learned embeddings are better able to classify discussions about exploited vulnerabilities than traditional text analysis methods. Evaluations demonstrate the efficacy of learned embeddings on both structured text (such as security blog posts) and unstructured text (darkweb/deepweb posts). DarkEmbed outperforms state-of-the-art approaches on the exploit prediction task with an F1-score of 0.74.
AB - Software vulnerabilities can expose computer systems to attacks by malicious actors. With the number of vulnerabilities discovered in the recent years surging, creating timely patches for every vulnerability is not always feasible. At the same time, not every vulnerability will be exploited by attackers; hence, prioritizing vulnerabilities by assessing the likelihood they will be exploited has become an important research problem. Recent works used machine learning techniques to predict exploited vulnerabilities by analyzing discussions about vulnerabilities on social media. These methods relied on traditional text processing techniques, which represent statistical features of words, but fail to capture their context. To address this challenge, we propose DarkEmbed, a neural language modeling approach that learns low dimensional distributed representations, i.e., embeddings, of darkweb/deepweb discussions to predict whether vulnerabilities will be exploited. By capturing linguistic regularities of human language, such as syntactic, semantic similarity and logic analogy, the learned embeddings are better able to classify discussions about exploited vulnerabilities than traditional text analysis methods. Evaluations demonstrate the efficacy of learned embeddings on both structured text (such as security blog posts) and unstructured text (darkweb/deepweb posts). DarkEmbed outperforms state-of-the-art approaches on the exploit prediction task with an F1-score of 0.74.
UR - http://www.scopus.com/inward/record.url?scp=85102203426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102203426&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85102203426
T3 - Proceedings of the 30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018
SP - 7849
EP - 7854
BT - Proceedings of the 30th Innovative Applications of Artificial Intelligence Conference, IAAI 2018
A2 - Youngblood, G. Michael
A2 - Myers, Karen
PB - The AAAI Press
Y2 - 2 February 2018 through 7 February 2018
ER -