A Machine Learning Approach for Vulnerability Curation
@article{Chen2020AML, title={A Machine Learning Approach for Vulnerability Curation}, author={Yang Chen and Andrew E. Santosa and Ang Ming Yi and Abhishek Sharma and Asankhaya Sharma and D. Lo}, journal={Proceedings of the 17th International Conference on Mining Software Repositories}, year={2020} }
Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before…
Figures and Tables from this paper
7 Citations
Security Issue Classification for Vulnerability Management with Semi-supervised Learning
- Computer ScienceICISSP
- 2022
This work proposes the use of semi-supervised machine learning to classify issues as security-related to provide additional vulnerabilities in an automated pipeline, and its models, based on a Hierarchical Attention Network, outperform previously proposed models on a manually labelled test dataset.
TRACER: Finding Patches for Open Source Software Vulnerabilities
- Computer ScienceArXiv
- 2021
An empirical study is conducted to understand the quality and characteristics of patches for OSS vulnerabilities in two state-of-the-art vulnerability databases and the first automated approach, named TRACER, is proposed, to find patches for an OSS vulnerability from multiple sources.
SPI: Automated Identification of Security Patches via Commits
- Computer ScienceACM Trans. Softw. Eng. Methodol.
- 2022
A deep learning-based security patch identification system that consists of two composite neural networks that utilizes pretrained word representations learned from commits of open source repositories and one code-revision neural network that takes code before revision and after revision and learns the distinction on the statement level.
Predictive Models in Software Engineering: Challenges and Opportunities
- Computer ScienceACM Transactions on Software Engineering and Methodology
- 2022
The key models and approaches used, classify the different models, summarize the range of key application areas, and analyze research results are described and a proposed research road map for these opportunities is provided.
A Survey on Data-driven Software Vulnerability Assessment and Prioritization
- Computer ScienceACM Computing Surveys
- 2022
A survey provides a taxonomy of the past research efforts and highlights the best practices for data-driven SV assessment and prioritization and discusses the current limitations and propose potential solutions to address such issues.
Security Bug Report Usage for Software Vulnerability Research: A Systematic Mapping Study
- Computer ScienceIEEE Access
- 2021
Findings from a systematic mapping study of research that use security bug reports for software vulnerability research can be leveraged to identify research opportunities in the domains of software vulnerability classification and automated vulnerability repair techniques.
Automated Identification of Libraries from Vulnerability Data: Can We Do Better?
- Computer Science
- 2022
Software engineers depend heavily on software libraries and have to update their dependencies once vulnerabilities are found in them. Software Composition Analysis (SCA) helps developers identify…
References
SHOWING 1-10 OF 43 REFERENCES
Automated Identification of Libraries from Vulnerability Data
- Computer Science2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
- 2020
This work formulates and solves for the first time library name identification from NVD data as XML, and deploys the solution in a complete production system.
Machine learning for finding bugs: An initial report
- Computer Science2017 IEEE Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE)
- 2017
While on the surface the initial results were encouraging, further investigation suggests that the machine learning techniques used are not suitable replacements for static program analysis tools due to low precision of the results.
Toward Large-Scale Vulnerability Discovery using Machine Learning
- Computer ScienceCODASPY
- 2016
This paper presents an approach that uses lightweight static and dynamic features to predict if a test case is likely to contain a software vulnerability using machine learning techniques, and developed and implemented VDiscover, a tool that uses state-of-the-art Machine Learning techniques to predict vulnerabilities in test cases.
A Practical Approach to the Automatic Classification of Security-Relevant Commits
- Computer Science2018 IEEE International Conference on Software Maintenance and Evolution (ICSME)
- 2018
An approach that uses machine-learning to analyze source code repositories and to automatically identify commits that are security-relevant (i.e., that are likely to fix a vulnerability) is proposed, requiring a significantly smaller amount of training data and employing a simpler architecture.
Vulnerability Extrapolation: Assisted Discovery of Vulnerabilities Using Machine Learning
- Computer ScienceWOOT
- 2011
This paper proposes a method for assisted discovery of vulnerabilities in source code by embedding code in a vector space and automatically determining API usage patterns using machine learning, which can be exploited to guide the auditing of code and to identify potentially vulnerable code with similar characteristics.
Automated identification of security issues from commit messages and bug reports
- Computer ScienceESEC/SIGSOFT FSE
- 2017
This work describes an efficient automatic vulnerability identification system geared towards tracking large-scale projects in real time using natural language processing and machine learning techniques and achieves promising results on vulnerability identification.
VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits
- Computer ScienceCCS
- 2015
A new method of finding potentially dangerous code in code repositories with a significantly lower false-positive rate than comparable systems is presented, which combines code-metric analysis with metadata gathered from code repositories to help code review teams prioritize their work.
Automated vulnerability detection system based on commit messages
- Computer Science
- 2019
A large-scale crawling of Git commits for some popular open source repositories is conducted, a web-based triage system is developed, and a deep neural network is implemented to automatically identify vulnerability-fixing commits (VFC) based on the commit messages.
When a Patch Goes Bad: Exploring the Properties of Vulnerability-Contributing Commits
- Computer Science2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement
- 2013
This study traced 68 vulnerabilities in the Apache HTTP server back to the version control commits that contributed the vulnerable code originally, and showed that VCCs are large: more than twice as much code churn on average than non-VCCs, even when normalized against lines of code.
The importance of accounting for real-world labelling when predicting software vulnerabilities
- Computer ScienceESEC/SIGSOFT FSE
- 2019
The results reveal that the unrealistic labelling assumption can profoundly mis- lead the scientific conclusions drawn; suggesting highly effective and deployable prediction results vanish when the authors fully account for realistically available labelling in the experimental methodology.