CyNER: A Python Library for Cybersecurity Named Entity Recognition

  title={CyNER: A Python Library for Cybersecurity Named Entity Recognition},
  author={Md Tanvirul Alam and Dipkamal Bhusal and Youngja Park and Nidhi Rastogi},
Open Cyber threat intelligence (OpenCTI) information is available in an unstructured format from heterogeneous sources on the Internet. We present CyNER, an open-source python library for cybersecurity named entity recognition (NER). CyNER combines transformer-based models for extracting cybersecurity-related entities, heuristics for extracting different indicators of compromise, and publicly available NER models for generic entity types. We provide models trained on a diverse corpus that users… 

Figures and Tables from this paper

A Machine Learning Approach for the NLP-Based Analysis of Cyber Threats and Vulnerabilities of the Healthcare Ecosystem

The results demonstrate the effectiveness of the proposed approach, which provides a realistic manner to assess the threats and vulnerabilities from Natural Language texts, allowing adopting it in real-world Healthcare ecosystems.

Recognizing and Extracting Cybersecurtity-relevant Entities from Text

This work has created an initial unstructured CTI corpus from a variety of open sources that is used to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities.

Recognizing and Extracting Cybersecurity Entities from Text

An initial unstructured CTI corpus is created from a variety of open sources that is used to train and test cybersecurity entity models using the spaCy framework and exploring self-learning methods to automatically recognize cybersecurity entities.



FLAIR: An Easy-to-Use Framework for State-of-the-Art NLP

The core idea of the FLAIR framework is to present a simple, unified interface for conceptually very different types of word and document embeddings, which effectively hides all embedding-specific engineering complexity and allows researchers to “mix and match” variousembeddings with little effort.

The Named Entity Recognition of Chinese Cybersecurity Using an Active Learning Strategy

The experimental results show that TPCL performs better than the traditional strategies in terms of accuracy and F1, and is more suitable for the Chinese entity recognition task in this field.

Automatic extraction of named entities of cyber threats using a deep Bi-LSTM-CRF network

This study proposes a novel approach that automatically extracts core information from CTI reports using a named entity recognition (NER) system and releases 498,000 tag datasets created during the research.

RoBERTa: A Robustly Optimized BERT Pretraining Approach

It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

Unsupervised Cross-lingual Representation Learning at Scale

It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

MALOnt: An Ontology for Malware Threat Intelligence

An open-source malware ontology, MALOnt is introduced that allows the structured extraction of information and knowledge graph generation, especially for threat intelligence, and enables the analysis, detection, classification, and attribution of cyber threats caused by malware.

An Ontology-driven Knowledge Graph for Android Malware

This work presents MalONT2.0 -- an ontology for malware threat intelligence, which allows researchers to extensively capture all requisite classes and relations that gather semantic and syntactic characteristics of an android malware attack.

Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning

This work proposes a novel security named entity recognition model based on regular expressions and known-entity dictionary as well as conditional random fields (CRF) combined with four feature templates, named RDF-CRF that can achieve better performance than state-of-the-art methods.

Creating Cybersecurity Knowledge Graphs From Malware After Action Reports

This paper describes a system to extract information from AARs, aggregate the extracted information by fusing similar entities together, and represent that extracted information in a Cybersecurity Knowledge Graph (CKG).