• Corpus ID: 3495200

Learning to Represent Programs with Graphs

@article{Allamanis2018LearningTR,
  title={Learning to Represent Programs with Graphs},
  author={Miltiadis Allamanis and Marc Brockschmidt and Mahmoud Khademi},
  journal={ArXiv},
  year={2018},
  volume={abs/1711.00740}
}
Learning tasks on source code (i.e., formal languages) have been considered recently, but most work has tried to transfer natural language methods and does not capitalize on the unique opportunities offered by code's known syntax. [] Key Result Additionally, our testing showed that VarMisuse identifies a number of bugs in mature open-source projects.

Figures and Tables from this paper

Compiler-based graph representations for deep learning models of code
TLDR
This paper uses graph neural networks (GNNs) for learning predictive compiler tasks on two representations based on ASTs and CDFGs, which significantly outperforms the state-of-the-art in the task of heterogeneous OpenCL mapping, while providing orders of magnitude faster inference times, crucial for compiler optimizations.
Learning semantic program embeddings with graph interval neural network
TLDR
GINN has shown to be a general, powerful deep neural network for learning precise, semantic program embeddings and is evaluated for two popular downstream applications: variable misuse prediction and method name prediction, where GINN outperforms the state-of-the-art models by a comfortable margin.
Learning to Represent Programs with Code Hierarchies
TLDR
A novel network architecture, HIRGAST, is designed, which combines the strengths of Heterogeneous Graph Transformer Networks and Tree-based Convolutional Neural Networks to learn Abstract Syntax Trees enriched with code dependency information and a novel pretraining objective called Missing Subtree Prediction is proposed.
A Hybrid Approach for Learning Program Representations
TLDR
A new deep neural network, LIGER, is introduced, which learns program representations from a mixture of symbolic and concrete execution traces, and significantly outperforms code2seq, the previous state-of-the-art.
Deep Program Structure Modeling Through Multi-Relational Graph-based Learning
TLDR
POEM is a novel framework that automatically learns useful code representations from graph-based program structures using a graph neural network that is specially designed for capturing the syntax and semantic information from the program abstract syntax tree and the control and data flow graph.
Learning to Fix Build Errors with Graph2Diff Neural Networks
TLDR
This work presents a new deep learning architecture, called Graph2Diff, for automatically localizing and fixing build errors, which represents source code, build configuration files, and compiler diagnostic messages as a graph, and uses a Graph Neural Network model to predict a diff.
Open Vocabulary Learning on Source Code with a Graph-Structured Cache
TLDR
Combining this graph-structured cache strategy with recent Graph-Neural-Network-based models for supervised learning on code improves the models' performance on a code completion task and a variable naming task, at the cost of a moderate increase in computation time.
Universal Representation for Code
TLDR
This work presents effective pre-training strategies on top of a novel graph-based code representation, to produce universal representations for code, and reveals discriminative properties in the authors' universal code representation.
Learning Blended, Precise Semantic Program Embeddings
TLDR
This paper introduces a new deep neural network, \liger, which learns program representations from a mixture of symbolic and concrete execution traces, and significantly outperforms code2seq, the previous state-of-the-art for method name prediction.
When deep learning met code search
TLDR
This paper assembled implementations of state-of-the-art techniques to run on a common platform, training and evaluation corpora, and introduced a new design point that is a minimal supervision extension to an existing unsupervised technique.
...
...

References

SHOWING 1-10 OF 30 REFERENCES
Gated Graph Sequence Neural Networks
TLDR
This work studies feature learning techniques for graph-structured inputs and achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be matched to abstract data structures.
Learning Python Code Suggestion with a Sparse Pointer Network
TLDR
A neural language model with a sparse pointer network aimed at capturing very long range dependencies and a qualitative analysis shows this model indeed captures interesting long-range dependencies, like referring to a class member defined over 60 tokens in the past.
Suggesting accurate method and class names
TLDR
A neural probabilistic language model for source code that is specifically designed for the method naming problem is introduced, and a variant of the model is introduced that is, to the knowledge, the first that can propose neologisms, names that have not appeared in the training corpus.
A Survey of Machine Learning for Big Code and Naturalness
TLDR
This article presents a taxonomy based on the underlying design principles of each model and uses it to navigate the literature and discuss cross-cutting and application-specific challenges and opportunities.
Probabilistic model for code with decision trees
TLDR
The key idea is to phrase the problem of learning a probabilistic model of code as learning a decision tree in a domain specific language over abstract syntax trees (called TGen), which allows us to condition the prediction of a program element on a dynamically computed context.
Predicting Program Properties from "Big Code"
TLDR
This work formulating the problem of inferring program properties as structured prediction and showing how to perform both learning and inference in this context opens up new possibilities for attacking a wide range of difficult problems in the context of "Big Code" including invariant generation, decompilation, synthesis and others.
Structured Generative Models of Natural Source Code
TLDR
A family of generative models for NSC that have three key properties: first, they incorporate both sequential and hierarchical structure, second, they learn a distributed representation of source code elements, and third, they integrate closely with a compiler.
A Convolutional Attention Network for Extreme Summarization of Source Code
TLDR
An attentional neural network that employs convolution on the input tokens to detect local time-invariant and long-range topical attention features in a context-dependent way to solve the problem of extreme summarization of source code snippets into short, descriptive function name-like summaries is introduced.
Code completion with statistical language models
TLDR
The main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences, and design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model.
node2vec: Scalable Feature Learning for Networks
TLDR
In node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks, a flexible notion of a node's network neighborhood is defined and a biased random walk procedure is designed, which efficiently explores diverse neighborhoods.
...
...