• Publications
  • Influence
Learning to Represent Programs with Graphs
TLDR
We propose to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures. Expand
  • 252
  • 57
  • PDF
A Convolutional Attention Network for Extreme Summarization of Source Code
TLDR
We introduce an attentional neural network that employs convolution on the input tokens to detect local time-invariant and long-range topical attention features in a context-dependent way. Expand
  • 253
  • 26
  • PDF
Suggesting accurate method and class names
TLDR
We introduce a neural probabilistic language model for source code that is specifically designed for the method naming problem. Expand
  • 240
  • 23
  • PDF
Mining source code repositories at massive scale using language modeling
TLDR
The tens of thousands of high-quality open source software projects on the Internet raise the exciting possibility of studying software development by finding patterns across truly large source code repositories. Expand
  • 208
  • 21
  • PDF
Learning natural coding conventions
TLDR
We present NATURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. Expand
  • 246
  • 17
  • PDF
Constrained Graph Variational Autoencoders for Molecule Design
TLDR
We propose a novel probabilistic model for graph generation that builds gated graph neural networks into the encoder and decoder of a variational autoencoder. Expand
  • 142
  • 17
  • PDF
A Survey of Machine Learning for Big Code and Naturalness
TLDR
We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Expand
  • 284
  • 12
  • PDF
Deep learning type inference
TLDR
We propose DeepTyper, a deep learning model that understands which types naturally occur in certain contexts and relations and can provide type suggestions, which can often be verified by the type checker, even if it could not infer the type initially. Expand
  • 60
  • 12
  • PDF
Mining idioms from source code
TLDR
We present Haggis, a system for automatically mining code idioms from a corpus of previously written, idiomatic software projects. Expand
  • 116
  • 10
  • PDF
Bimodal Modelling of Source Code and Natural Language
TLDR
We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. Expand
  • 131
  • 9
  • PDF