Human-Machine Interaction for Improved Cybersecurity Named Entity Recognition Considering Semantic Similarity

Kazuaki Kashihara, Jana Shakarian, Chitta Baral

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The automated and timely conversion or extraction of cybersecurity information from unstructured text from online sources is important and required for many applications. Named Entity Recognition (NER) is used to detect the relevant domain entities such as product, attack name, malware name, hacker group name, etc. To train a new NER model for cybersecurity, traditional NER requires a training corpus annotated with cybersecurity entities and state-of-the-art methods require time-consuming and labor intensive feature engineering. We propose a Human-Machine Interaction method for semi-automatic labeling and corpus generation for cybersecurity entities. Our method evaluates the learned NER model with the sentences that we collected in the training process, and the user selects only the correct pair of the named entity and its category for next iteration training. Thus, each iteration gets better training corpora to train the NER model. Some entities are ambiguous since the word or phrase has multiple meanings. We introduce a new semantic similarity measure and determine which category the word belongs to based on this semantic similarity of the entire sentence. The experimental evaluation result shows that our method is better than existing methods in finding undiscovered keywords of given categories.

Original languageEnglish (US)
Title of host publicationIntelligent Systems and Applications - Proceedings of the 2020 Intelligent Systems Conference IntelliSys Volume 2
EditorsKohei Arai, Supriya Kapoor, Rahul Bhatia
PublisherSpringer
Pages347-361
Number of pages15
ISBN (Print)9783030551865
DOIs
StatePublished - 2021
EventIntelligent Systems Conference, IntelliSys 2020 - London, United Kingdom
Duration: Sep 3 2020Sep 4 2020

Publication series

NameAdvances in Intelligent Systems and Computing
Volume1251 AISC
ISSN (Print)2194-5357
ISSN (Electronic)2194-5365

Conference

ConferenceIntelligent Systems Conference, IntelliSys 2020
Country/TerritoryUnited Kingdom
CityLondon
Period9/3/209/4/20

Keywords

  • Cybersecurity
  • NER
  • Named Entity Recognition
  • Semantic similarity

ASJC Scopus subject areas

  • Control and Systems Engineering
  • General Computer Science

Fingerprint

Dive into the research topics of 'Human-Machine Interaction for Improved Cybersecurity Named Entity Recognition Considering Semantic Similarity'. Together they form a unique fingerprint.

Cite this