A Survey on Machine Learning Techniques for Source Code Analysis
@article{Sharma2021ASO, title={A Survey on Machine Learning Techniques for Source Code Analysis}, author={Tushar Sharma and Maria Kechagia and Stefanos Georgiou and Rohit Tiwari and Federica Sarro}, journal={ArXiv}, year={2021}, volume={abs/2110.09610} }
semantic graph, CFG Token-based, path-
Figures and Tables from this paper
19 Citations
Inspect4py: A Knowledge Extraction Framework for Python Code Repositories
- Computer Science2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)
- 2022
Inspect4py is a static code analysis framework designed to automatically extract the main features, metadata and documentation of Python code repositories and aims to ease the understandability and adoption of software repositories by other researchers and developers.
Learning to Represent Programs with Heterogeneous Graphs
- Computer Science2022 IEEE/ACM 30th International Conference on Program Comprehension (ICPC)
- 2022
This paper proposes the heterogeneous program graph (HPG), which provides the types of the nodes and the edges explicitly, and employs the heterogeneity transformer (HGT) architecture to generate representations based on HPG, considering the type of information during processing.
NeuDep: neural binary memory dependence analysis
- Computer ScienceESEC/SIGSOFT FSE
- 2022
This work presents a new machine-learning-based approach to predict memory dependencies by exploiting the model's learned knowledge about how binary programs execute, and demonstrates that NeuDep is more precise and faster than the current state-of-the-art on these tasks.
Enabling Automatic Repair of Source Code Vulnerabilities Using Data-Driven Methods
- Computer Science2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
- 2022
This work proposes ways to improve code representations for vulnerability repair from three perspectives: input data type, data-driven models, and downstream tasks.
FineCodeAnalyzer: Multi-Perspective Source Code Analysis Support for Software Developer Through Fine-Granular Level Interactive Code Visualization
- Computer ScienceIEEE Access
- 2022
This work proposes a tool (called as FineCodeAnalyzer) that supports an interactive source code analysis grounded on structural and historical relations at fine granular-level between the source code elements that outperforms the developers’ self-adopted strategies in locating the code elements.
Extracting Label-specific Key Input Features for Neural Code Intelligence Models
- Computer ScienceArXiv
- 2022
Extracting key input features from reduced programs reveals that the syntax-guided reduced programs contain more label-specific key input Features that may help to understand the reasoning of models’ prediction from different perspectives and increase the trustworthiness to correct classification given by CI models.
1.1 Synthesizing Tests with Oracles Using Structured Natural Language Specifications
- Computer Science
- 2021
The overarching goal of the research is to automate software debugging by using natural language software artifacts and aid software engineers in developing high-quality software.
A Survey of Automatic Source Code Summarization
- Computer ScienceSymmetry
- 2022
A review of the development of ASCS technology, which involves source code modeling, code summarization generation, and quality evaluation, and categorizes the existing ASCS techniques based on the above stages and analyze their advantages and shortcomings.
Syntax-guided program reduction for understanding neural code intelligence models
- Computer ScienceMAPS@PLDI
- 2022
A syntax-guided program reduction technique that considers the grammar of the input programs during reduction that is faster and provides smaller sets of key tokens in reduced programs is applied.
Code2Snapshot: Using Code Snapshots for Learning Representations of Source Code
- Computer Science
- 2021
This paper investigates Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs, and evaluates several variations of this representation and compares its performance with state-of-the-art representations that utilize the rich syntactic and semantic features ofinput programs.
References
SHOWING 1-10 OF 518 REFERENCES
Juliet 1.1 C/C++ and Java Test Suite
- Computer ScienceComputer
- 2012
Juliet Test Suite 1.1 offers test cases for assessing the effectiveness of static analyzers and other software-assurance tools.
Automated support for diagnosis and repair
- Computer ScienceCommun. ACM
- 2015
Model checking and logic-based learning together deliver automated support, especially in adaptive and autonomous systems.
Automated program repair
- BusinessCommun. ACM
- 2019
This presentation explains how to design and implement an automated program repair system that automates the very labor-intensive and therefore time-heavy and expensive process of manually fixing programming mistakes.
Capturing source code semantics via tree-based convolution over API-enhanced AST
- Computer ScienceCF
- 2019
This work proposes to use tree-based convolution over API-enhanced AST to detect semantic clones---code fragments with similar semantics but dissimilar syntax, and proposes architectures that incorporate the approach for code search and code summarization.
A Machine Learning Approach for Vulnerability Curation
- Computer ScienceMSR
- 2020
The design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item is reported, and there is no uniform ordering of word2vec parameters sensitivity across data sources.
Semantic Clone Detection Using Machine Learning
- Computer Science2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)
- 2016
A machine learning framework to automatically detect clones in software, which is able to detect Types-3 and the most complicated kind of clones, Type-4 clones, is presented.
Semantic Feature Learning via Dual Sequences for Defect Prediction
- Computer ScienceIEEE Access
- 2021
This paper proposes a framework called Semantic Feature Learning via Dual Sequences (SFLDS), which can capture the semantic and structural information in the AST for feature generation and uses a bi-directional long short-term memory (BiLSTM) based neural network to automatically generate semantic features from the dual sequences for SDP.
A Transformer-based Approach for Source Code Summarization
- Computer ScienceACL
- 2020
This work explores the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies in source code summarization, and shows that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
DeepCommenter: a deep code comment generation tool with hybrid lexical and syntactical information
- Computer ScienceESEC/SIGSOFT FSE
- 2020
DeepCommenter formulates the comment generation task as a machine translation problem and exploits a deep neural network that combines the lexical and structural information of Java methods to generate descriptive comments for Java methods.
A Novel Neural Source Code Representation Based on Abstract Syntax Tree
- Computer Science2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)
- 2019
This paper proposes a novel AST-based Neural Network (ASTNN) for source code representation that splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements.