Darkembed: Exploit prediction with neural language models

Nazgol Tavabi; Palash Goyal; Mohammed Almukaynizi; Paulo Shakarian; Kristina Lerman

Darkembed: Exploit prediction with neural language models

Nazgol Tavabi, Palash Goyal, Mohammed Almukaynizi, Paulo Shakarian, Kristina Lerman

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Software vulnerabilities can expose computer systems to attacks by malicious actors. With the number of vulnerabilities discovered in the recent years surging, creating timely patches for every vulnerability is not always feasible. At the same time, not every vulnerability will be exploited by attackers; hence, prioritizing vulnerabilities by assessing the likelihood they will be exploited has become an important research problem. Recent works used machine learning techniques to predict exploited vulnerabilities by analyzing discussions about vulnerabilities on social media. These methods relied on traditional text processing techniques, which represent statistical features of words, but fail to capture their context. To address this challenge, we propose DarkEmbed, a neural language modeling approach that learns low dimensional distributed representations, i.e., embeddings, of darkweb/deepweb discussions to predict whether vulnerabilities will be exploited. By capturing linguistic regularities of human language, such as syntactic, semantic similarity and logic analogy, the learned embeddings are better able to classify discussions about exploited vulnerabilities than traditional text analysis methods. Evaluations demonstrate the efficacy of learned embeddings on both structured text (such as security blog posts) and unstructured text (darkweb/deepweb posts). DarkEmbed outperforms state-of-the-art approaches on the exploit prediction task with an F ₁ -score of 0.74.

Original language	English (US)
Title of host publication	32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Publisher	AAAI press
Pages	7849-7854
Number of pages	6
ISBN (Electronic)	9781577358008
State	Published - 2018
Event	32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States Duration: Feb 2 2018 → Feb 7 2018

Publication series

Name	32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Other

Other	32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Country/Territory	United States
City	New Orleans
Period	2/2/18 → 2/7/18

ASJC Scopus subject areas

Artificial Intelligence

Cite this

Tavabi, N, Goyal, P, Almukaynizi, M, Shakarian, P & Lerman, K 2018, Darkembed: Exploit prediction with neural language models. in 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, AAAI press, pp. 7849-7854, 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, United States, 2/2/18.

@inproceedings{913d8a144302427594fcac05fb45c955,

title = "Darkembed: Exploit prediction with neural language models",

abstract = " Software vulnerabilities can expose computer systems to attacks by malicious actors. With the number of vulnerabilities discovered in the recent years surging, creating timely patches for every vulnerability is not always feasible. At the same time, not every vulnerability will be exploited by attackers; hence, prioritizing vulnerabilities by assessing the likelihood they will be exploited has become an important research problem. Recent works used machine learning techniques to predict exploited vulnerabilities by analyzing discussions about vulnerabilities on social media. These methods relied on traditional text processing techniques, which represent statistical features of words, but fail to capture their context. To address this challenge, we propose DarkEmbed, a neural language modeling approach that learns low dimensional distributed representations, i.e., embeddings, of darkweb/deepweb discussions to predict whether vulnerabilities will be exploited. By capturing linguistic regularities of human language, such as syntactic, semantic similarity and logic analogy, the learned embeddings are better able to classify discussions about exploited vulnerabilities than traditional text analysis methods. Evaluations demonstrate the efficacy of learned embeddings on both structured text (such as security blog posts) and unstructured text (darkweb/deepweb posts). DarkEmbed outperforms state-of-the-art approaches on the exploit prediction task with an F 1 -score of 0.74. ",

author = "Nazgol Tavabi and Palash Goyal and Mohammed Almukaynizi and Paulo Shakarian and Kristina Lerman",

note = "Funding Information: This work was supported by the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA) via the Air Force Research Laboratory (AFRL) contract number FA8750-16-C-0112. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, AFRL, or the U.S. Government. Publisher Copyright: Copyright {\textcopyright} 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.; 32nd AAAI Conference on Artificial Intelligence, AAAI 2018 ; Conference date: 02-02-2018 Through 07-02-2018",

year = "2018",

language = "English (US)",

series = "32nd AAAI Conference on Artificial Intelligence, AAAI 2018",

publisher = "AAAI press",

pages = "7849--7854",

booktitle = "32nd AAAI Conference on Artificial Intelligence, AAAI 2018",

}

TY - GEN

T1 - Darkembed

T2 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

AU - Tavabi, Nazgol

AU - Goyal, Palash

AU - Almukaynizi, Mohammed

AU - Shakarian, Paulo

AU - Lerman, Kristina

N1 - Funding Information: This work was supported by the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA) via the Air Force Research Laboratory (AFRL) contract number FA8750-16-C-0112. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, AFRL, or the U.S. Government. Publisher Copyright: Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

PY - 2018

Y1 - 2018

N2 - Software vulnerabilities can expose computer systems to attacks by malicious actors. With the number of vulnerabilities discovered in the recent years surging, creating timely patches for every vulnerability is not always feasible. At the same time, not every vulnerability will be exploited by attackers; hence, prioritizing vulnerabilities by assessing the likelihood they will be exploited has become an important research problem. Recent works used machine learning techniques to predict exploited vulnerabilities by analyzing discussions about vulnerabilities on social media. These methods relied on traditional text processing techniques, which represent statistical features of words, but fail to capture their context. To address this challenge, we propose DarkEmbed, a neural language modeling approach that learns low dimensional distributed representations, i.e., embeddings, of darkweb/deepweb discussions to predict whether vulnerabilities will be exploited. By capturing linguistic regularities of human language, such as syntactic, semantic similarity and logic analogy, the learned embeddings are better able to classify discussions about exploited vulnerabilities than traditional text analysis methods. Evaluations demonstrate the efficacy of learned embeddings on both structured text (such as security blog posts) and unstructured text (darkweb/deepweb posts). DarkEmbed outperforms state-of-the-art approaches on the exploit prediction task with an F 1 -score of 0.74.

AB - Software vulnerabilities can expose computer systems to attacks by malicious actors. With the number of vulnerabilities discovered in the recent years surging, creating timely patches for every vulnerability is not always feasible. At the same time, not every vulnerability will be exploited by attackers; hence, prioritizing vulnerabilities by assessing the likelihood they will be exploited has become an important research problem. Recent works used machine learning techniques to predict exploited vulnerabilities by analyzing discussions about vulnerabilities on social media. These methods relied on traditional text processing techniques, which represent statistical features of words, but fail to capture their context. To address this challenge, we propose DarkEmbed, a neural language modeling approach that learns low dimensional distributed representations, i.e., embeddings, of darkweb/deepweb discussions to predict whether vulnerabilities will be exploited. By capturing linguistic regularities of human language, such as syntactic, semantic similarity and logic analogy, the learned embeddings are better able to classify discussions about exploited vulnerabilities than traditional text analysis methods. Evaluations demonstrate the efficacy of learned embeddings on both structured text (such as security blog posts) and unstructured text (darkweb/deepweb posts). DarkEmbed outperforms state-of-the-art approaches on the exploit prediction task with an F 1 -score of 0.74.

UR - http://www.scopus.com/inward/record.url?scp=85056753402&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056753402&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85056753402

T3 - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

SP - 7849

EP - 7854

BT - 32nd AAAI Conference on Artificial Intelligence, AAAI 2018

PB - AAAI press

Y2 - 2 February 2018 through 7 February 2018

ER -

Darkembed: Exploit prediction with neural language models

Abstract

Publication series

Other

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this