Understanding neural code intelligence through program simplification
@article{Rabin2021UnderstandingNC, title={Understanding neural code intelligence through program simplification}, author={Md Rafiqul Islam Rabin and Vincent J. Hellendoorn and Mohammad Amin Alipour}, journal={Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering}, year={2021} }
A wide range of code intelligence (CI) tools, powered by deep neural networks, have been developed recently to improve programming productivity and perform program analysis. To reliably use such tools, developers often need to reason about the behavior of the underlying models and the factors that affect them. This is especially challenging for tools backed by deep neural networks. Various methods have tried to reduce this opacity in the vein of "transparent/interpretable-AI". However, these…
Figures and Tables from this paper
14 Citations
Extracting Label-specific Key Input Features for Neural Code Intelligence Models
- Computer Science
- 2022
Extracting key input features from reduced programs reveals that the syntax-guided reduced programs contain more label-specific key input Features that may help to understand the reasoning of models’ prediction from different perspectives and increase the trustworthiness to correct classification given by CI models.
Syntax-guided program reduction for understanding neural code intelligence models
- Computer ScienceMAPS@PLDI
- 2022
A syntax-guided program reduction technique that considers the grammar of the input programs during reduction that is faster and provides smaller sets of key tokens in reduced programs is applied.
Learning to Represent Programs with Code Hierarchies
- Computer ScienceArXiv
- 2022
A novel network architecture, HIRGAST, is designed, which combines the strengths of Heterogeneous Graph Transformer Networks and Tree-based Convolutional Neural Networks to learn Abstract Syntax Trees enriched with code dependency information and a novel pretraining objective called Missing Subtree Prediction is proposed.
Memorization and Generalization in Neural Code Intelligence Models
- Computer ScienceSSRN Electronic Journal
- 2022
This work evaluates the memorization and generalization tendencies in neural code intelligence models through a case study across several benchmarks and model families by leveraging established approaches from other fields that use DNNs, such as introducing targeted noise into the training dataset.
Counterfactual Explanations for Models of Code
- Computer Science2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
- 2022
This paper integrates counterfactual explanation generation to models of source code in a real-world setting and investigates the efficacy of the approach on three different models, each based on a BERT-like architecture operating over source code.
Data-Driven AI Model Signal-Awareness Enhancement and Introspection
- Computer Science
- 2021
This paper combines the SE concept of code complexity with the AI technique of curriculum learning, and incorporates SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset.
Towards Reliable AI for Source Code Understanding
- Computer ScienceSoCC
- 2021
This work highlights the need for concerted efforts from the research community to ensure credibility, accountability, and traceability for AI-for-code and outlines three stages of an AI pipeline- data collection, model training, and prediction analysis.
Code2Snapshot: Using Code Snapshots for Learning Representations of Source Code
- Computer Science
- 2021
This paper investigates Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs, and evaluates several variations of this representation and compares its performance with state-of-the-art representations that utilize the rich syntactic and semantic features ofinput programs.
Encoding Program as Image: Evaluating Visual Representation of Source Code
- Computer ScienceArXiv
- 2021
This paper investigates Code2Snapshot, a novel representation of the source code that is based on the snapshots of input programs, and evaluates several variations of this representation and compares its performance with state-of-the-art representations that utilize the rich syntactic and semantic features ofinput programs.
Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection
- Computer ScienceArXiv
- 2021
This paper combines the SE concept of code complexity with the AI technique of curriculum learning, and incorporates SE assistance into AI models by customizing Delta Debugging to generate simplified signal-preserving programs, augmenting them to the training dataset.
References
SHOWING 1-10 OF 45 REFERENCES
Testing Neural Program Analyzers
- Computer Science
- 2019
In a preliminary experiment on a neural model recently proposed in the literature, it is found that the model is very brittle, and simple perturbations in the input can cause the model to make mistakes in its prediction.
Evaluation of Generalizability of Neural Program Analyzers under Semantic-Preserving Transformations
- Computer ScienceArXiv
- 2020
A large-scale evaluation of the generalizability of two popular neural program analyzers using seven semantically-equivalent transformations of programs to provide the initial stepping stones for quantifying robustness in neural program Analyzers.
On the generalizability of Neural Program Models with respect to semantic-preserving program transformations
- Computer ScienceInf. Softw. Technol.
- 2021
Neural Program Repair by Jointly Learning to Localize and Repair
- Computer ScienceICLR
- 2019
It is beneficial to train a model that jointly and directly localizes and repairs variable-misuse bugs, and the experimental results show that the joint model significantly outperforms an enumerative solution that uses a pointer based model for repair alone.
Learning to Represent Programs with Graphs
- Computer ScienceICLR
- 2018
This work proposes to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures, and suggests that these models learn to infer meaningful names and to solve the VarMisuse task in many cases.
On the "naturalness" of buggy code
- Computer ScienceICSE
- 2016
It is found that code with bugs tends to be more entropic (i.e. unnatural), becoming less so as bugs are fixed, suggesting that entropy may be a valid, simple way to complement the effectiveness of PMD or FindBugs, and that search-based bug-fixing methods may benefit from using entropy both for fault-localization and searching for fixes.
Toward Deep Learning Software Repositories
- Computer Science2015 IEEE/ACM 12th Working Conference on Mining Software Repositories
- 2015
This work motivate deep learning for software language modeling, highlighting fundamental differences between state-of-the-practice software language models and connectionist models, and proposes avenues for future work, where deep learning can be brought to bear to support model-based testing, improve software lexicons, and conceptualize software artifacts.
A Survey of Machine Learning for Big Code and Naturalness
- Computer ScienceACM Comput. Surv.
- 2018
This article presents a taxonomy based on the underlying design principles of each model and uses it to navigate the literature and discuss cross-cutting and application-specific challenges and opportunities.
AutoFocus: Interpreting Attention-Based Neural Networks by Code Perturbation
- Computer Science2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE)
- 2019
Based on evaluation on more than 1000 programs for 10 different sorting algorithms, it is observed that the attention scores are highly correlated to the effects of the perturbed code elements, which provides a strong basis for the uses of attention scores to interpret the relations between code elements and the algorithm classification results of a neural network.
Cause reduction: delta debugging, even without bugs
- Computer ScienceSoftw. Test. Verification Reliab.
- 2016
Suites produced by cause reduction provide effective quick tests for real‐world programs, including improving seeded symbolic execution, where using reduced tests can often double the number of additional branches explored.