• Corpus ID: 239024353

A Survey on Machine Learning Techniques for Source Code Analysis

@article{Sharma2021ASO,
  title={A Survey on Machine Learning Techniques for Source Code Analysis},
  author={Tushar Sharma and Maria Kechagia and Stefanos Georgiou and Rohit Tiwari and Federica Sarro},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.09610}
}
semantic graph, CFG Token-based, path- 

Inspect4py: A Knowledge Extraction Framework for Python Code Repositories

  • Rosa FilgueiraD. Garijo
  • Computer Science
    2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR)
  • 2022
Inspect4py is a static code analysis framework designed to automatically extract the main features, metadata and documentation of Python code repositories and aims to ease the understandability and adoption of software repositories by other researchers and developers.

Learning to Represent Programs with Heterogeneous Graphs

This paper proposes the heterogeneous program graph (HPG), which provides the types of the nodes and the edges explicitly, and employs the heterogeneity transformer (HGT) architecture to generate representations based on HPG, considering the type of information during processing.

NeuDep: neural binary memory dependence analysis

This work presents a new machine-learning-based approach to predict memory dependencies by exploiting the model's learned knowledge about how binary programs execute, and demonstrates that NeuDep is more precise and faster than the current state-of-the-art on these tasks.

Enabling Automatic Repair of Source Code Vulnerabilities Using Data-Driven Methods

  • Anastasiia Grishina
  • Computer Science
    2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
  • 2022
This work proposes ways to improve code representations for vulnerability repair from three perspectives: input data type, data-driven models, and downstream tasks.

FineCodeAnalyzer: Multi-Perspective Source Code Analysis Support for Software Developer Through Fine-Granular Level Interactive Code Visualization

This work proposes a tool (called as FineCodeAnalyzer) that supports an interactive source code analysis grounded on structural and historical relations at fine granular-level between the source code elements that outperforms the developers’ self-adopted strategies in locating the code elements.

Extracting Label-specific Key Input Features for Neural Code Intelligence Models

Extracting key input features from reduced programs reveals that the syntax-guided reduced programs contain more label-specific key input Features that may help to understand the reasoning of models’ prediction from different perspectives and increase the trustworthiness to correct classification given by CI models.

1.1 Synthesizing Tests with Oracles Using Structured Natural Language Specifications

The overarching goal of the research is to automate software debugging by using natural language software artifacts and aid software engineers in developing high-quality software.

A Survey of Automatic Source Code Summarization

A review of the development of ASCS technology, which involves source code modeling, code summarization generation, and quality evaluation, and categorizes the existing ASCS techniques based on the above stages and analyze their advantages and shortcomings.

Syntax-guided program reduction for understanding neural code intelligence models

A syntax-guided program reduction technique that considers the grammar of the input programs during reduction that is faster and provides smaller sets of key tokens in reduced programs is applied.

Code2Snapshot: Using Code Snapshots for Learning Representations of Source Code

This paper investigates Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs, and evaluates several variations of this representation and compares its performance with state-of-the-art representations that utilize the rich syntactic and semantic features ofinput programs.

References

SHOWING 1-10 OF 518 REFERENCES

Juliet 1.1 C/C++ and Java Test Suite

Juliet Test Suite 1.1 offers test cases for assessing the effectiveness of static analyzers and other software-assurance tools.

Automated support for diagnosis and repair

Model checking and logic-based learning together deliver automated support, especially in adaptive and autonomous systems.

Automated program repair

This presentation explains how to design and implement an automated program repair system that automates the very labor-intensive and therefore time-heavy and expensive process of manually fixing programming mistakes.

Capturing source code semantics via tree-based convolution over API-enhanced AST

This work proposes to use tree-based convolution over API-enhanced AST to detect semantic clones---code fragments with similar semantics but dissimilar syntax, and proposes architectures that incorporate the approach for code search and code summarization.

A Machine Learning Approach for Vulnerability Curation

The design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item is reported, and there is no uniform ordering of word2vec parameters sensitivity across data sources.

Semantic Clone Detection Using Machine Learning

A machine learning framework to automatically detect clones in software, which is able to detect Types-3 and the most complicated kind of clones, Type-4 clones, is presented.

Semantic Feature Learning via Dual Sequences for Defect Prediction

This paper proposes a framework called Semantic Feature Learning via Dual Sequences (SFLDS), which can capture the semantic and structural information in the AST for feature generation and uses a bi-directional long short-term memory (BiLSTM) based neural network to automatically generate semantic features from the dual sequences for SDP.

A Transformer-based Approach for Source Code Summarization

This work explores the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies in source code summarization, and shows that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.

DeepCommenter: a deep code comment generation tool with hybrid lexical and syntactical information

DeepCommenter formulates the comment generation task as a machine translation problem and exploits a deep neural network that combines the lexical and structural information of Java methods to generate descriptive comments for Java methods.

A Novel Neural Source Code Representation Based on Abstract Syntax Tree

This paper proposes a novel AST-based Neural Network (ASTNN) for source code representation that splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements.
...