Corpus ID: 230523926

Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization

  title={Generating Informative CVE Description From ExploitDB Posts by Extractive Summarization},
  author={Jiamou Sun and Zhenchang Xing and Hao Guo and Deheng Ye and Xiaohong Li and Xiwei Xu and Liming Zhu},
ExploitDB is one of the important public websites, which contributes a large number of vulnerabilities to official CVE database. Over 60% of these vulnerabilities have highor critical-security risks. Unfortunately, over 73% of exploits appear publicly earlier than the corresponding CVEs, and about 40% of exploits do not even have CVEs. To assist in documenting CVEs for the ExploitDB posts, we propose an open information method to extract 9 key vulnerability aspects (vulnerable product/version… Expand

Figures and Tables from this paper

Few-Sample Named Entity Recognition for Security Vulnerability Reports by Fine-Tuning Pre-Trained Language Models
  • Guanqun Yang, Shay Dineen, Zhipeng Lin, Xueqing Liu
  • Computer Science
  • Deployable Machine Learning for Security Defense
  • 2021
This work investigates the performance of fine-tuning several state-of-the-art pre-trained language models on a small training dataset and demonstrates the effectiveness of few-sample learning on NER for security vulnerability report. Expand
Linking Common Vulnerabilities and Exposures to the MITRE ATT&CK Framework: A Self-Distillation Approach
A model, named the CVE Transformer (CVET), is proposed, to label CVEs with one of ten MITRE ATT&CK tactics, and empirical results on a gold-standard dataset suggest that the proposed novelties can increase model performance in F1-score. Expand
A Survey on Data-driven Software Vulnerability Assessment and Prioritization
A survey provides a taxonomy of the past research efforts and highlights the best practices for data-driven SV assessment and prioritization and discusses the current limitations and propose potential solutions to address such issues. Expand


Towards the Detection of Inconsistencies in Public Security Vulnerability Reports
This paper proposes an automated system VIEM to detect inconsistent information between the fully standardized NVD database and the unstructured CVE descriptions and their referenced vulnerability reports, and suggests that inconsistent vulnerable software versions are highly prevalent. Expand
ChainSmith: Automatically Learning the Semantics of Malicious Campaigns by Mining Threat Intelligence Reports
  • Ziyun Zhu, T. Dumitras
  • Computer Science
  • 2018 IEEE European Symposium on Security and Privacy (EuroS&P)
  • 2018
The effectiveness of different persuasion techniques used on enticing user to download the payloads is studied, finding that the campaign usually starts from social engineering and "missing codec" ruse is a common persuasion technique that generates the most suspicious downloads each day. Expand
Categorizing and Predicting Invalid Vulnerabilities on Common Vulnerabilities and Exposures
This work first leverage card sorting to categorize invalid vulnerability reports, from which six main reasons are observed for rejected and disputed CVEs, respectively, and proposes a text mining approach to predict the invalidulnerability reports. Expand
Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description
This paper proposes a deep learning approach to predict multi-class severity level of software vulnerability using only vulnerability description, and uses word embeddings and a one-layer shallow Convolutional Neural Network to automatically capture discriminative word and sentence features of vulnerability descriptions for predicting vulnerability severity. Expand
Joint Prediction of Multiple Vulnerability Characteristics Through Multi-Task Learning
A multi-task machine learning approach for the joint prediction of multiple vulnerability characteristics based on the vulnerability descriptions that gets rid of the requirement of balanced data, and it relies on neural networks that learn to extract features from training data. Expand
Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning
A multi-layer neural network based architecture for API extraction that automatically learns character-, word- and sentence-level features from the input texts, thus removing the need for manual feature engineering and the dependence on advanced features beyond theinput texts. Expand
Learning to Extract API Mentions from Informal Natural Language Discussions
This paper proposes a semi-supervised machine-learning approach that exploits name synonyms and rich semantic context of API mentions to extract API mentions in informal social text and significantly outperforms existing API extraction techniques based on language-convention and sentence-format heuristics and earlier machine- learning based method for named-entity recognition. Expand
Text Summarization with Pretrained Encoders
This paper introduces a novel document-level encoder based on BERT which is able to express the semantics of a document and obtain representations for its sentences and proposes a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two. Expand
WHYPER: Towards Automating Risk Assessment of Mobile Applications
WHYPER, a framework using Natural Language Processing (NLP) techniques to identify sentences that describe the need for a given permission in an application description, demonstrates great promise in using NLP techniques to bridge the semantic gap between user expectations and application functionality, further aiding the risk assessment of mobile applications. Expand
SemFuzz: Semantics-based Automatic Generation of Proof-of-Concept Exploits
SemFuzz is presented, a novel technique leveraging vulnerability-related text to guide automatic generation of PoC exploits for the vulnerability types never automatically attacked, indicating that more complicated flaws can also be automatically attacked. Expand