• Publications
  • Influence
Learning to Represent Programs with Graphs
TLDR
This work proposes to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures, and suggests that these models learn to infer meaningful names and to solve the VarMisuse task in many cases. Expand
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
TLDR
The methodology used to obtain the corpus and expert labels, as well as a number of simple baseline solutions for the task are described. Expand
A Convolutional Attention Network for Extreme Summarization of Source Code
TLDR
An attentional neural network that employs convolution on the input tokens to detect local time-invariant and long-range topical attention features in a context-dependent way to solve the problem of extreme summarization of source code snippets into short, descriptive function name-like summaries is introduced. Expand
Mining source code repositories at massive scale using language modeling
TLDR
This paper builds the first giga-token probabilistic language model of source code, based on 352 million lines of Java, and proposes new metrics that measure the complexity of a code module and the topical centrality of a module to a software project. Expand
Constrained Graph Variational Autoencoders for Molecule Design
TLDR
A variational autoencoder model in which both encoder and decoder are graph-structured is proposed and it is shown that by using appropriate shaping of the latent space, this model allows us to design molecules that are (locally) optimal in desired properties. Expand
Suggesting accurate method and class names
TLDR
A neural probabilistic language model for source code that is specifically designed for the method naming problem is introduced, and a variant of the model is introduced that is, to the knowledge, the first that can propose neologisms, names that have not appeared in the training corpus. Expand
Structured Neural Summarization
TLDR
This work develops a framework to extend existing sequence encoders with a graph component that can reason about long-distance relationships in weakly structured data such as text and shows that the resulting hybrid sequence-graph models outperform both pure sequence models as well as pure graph models on a range of summarization tasks. Expand
A Survey of Machine Learning for Big Code and Naturalness
TLDR
This article presents a taxonomy based on the underlying design principles of each model and uses it to navigate the literature and discuss cross-cutting and application-specific challenges and opportunities. Expand
Deep learning type inference
TLDR
DeepTyper is proposed, a deep learning model that understands which types naturally occur in certain contexts and relations and can provide type suggestions, which can often be verified by the type checker, even if it could not infer the type initially. Expand
Learning natural coding conventions
TLDR
NATHURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency is presented, which builds on recent work in applying statistical natural language processing to source code. Expand
...
1
2
3
4
5
...