Share This Author
Learning to Represent Programs with Graphs
This work proposes to use graphs to represent both the syntactic and semantic structure of code and use graph-based deep learning methods to learn to reason over program structures, and suggests that these models learn to infer meaningful names and to solve the VarMisuse task in many cases.
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
- H. Husain, Hongqi Wu, Tiferet Gazit, Miltiadis Allamanis, M. Brockschmidt
- Computer ScienceArXiv
- 20 September 2019
The methodology used to obtain the corpus and expert labels, as well as a number of simple baseline solutions for the task are described.
A Convolutional Attention Network for Extreme Summarization of Source Code
An attentional neural network that employs convolution on the input tokens to detect local time-invariant and long-range topical attention features in a context-dependent way to solve the problem of extreme summarization of source code snippets into short, descriptive function name-like summaries is introduced.
Constrained Graph Variational Autoencoders for Molecule Design
A variational autoencoder model in which both encoder and decoder are graph-structured is proposed and it is shown that by using appropriate shaping of the latent space, this model allows us to design molecules that are (locally) optimal in desired properties.
Suggesting accurate method and class names
- Miltiadis Allamanis, Earl T. Barr, C. Bird, Charles Sutton
- Computer ScienceESEC/SIGSOFT FSE
- 30 August 2015
A neural probabilistic language model for source code that is specifically designed for the method naming problem is introduced, and a variant of the model is introduced that is, to the knowledge, the first that can propose neologisms, names that have not appeared in the training corpus.
Mining source code repositories at massive scale using language modeling
- Miltiadis Allamanis, Charles Sutton
- Computer Science10th Working Conference on Mining Software…
- 18 May 2013
This paper builds the first giga-token probabilistic language model of source code, based on 352 million lines of Java, and proposes new metrics that measure the complexity of a code module and the topical centrality of a module to a software project.
A Survey of Machine Learning for Big Code and Naturalness
- Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, Charles Sutton
- Computer ScienceACM Comput. Surv.
- 18 September 2017
This article presents a taxonomy based on the underlying design principles of each model and uses it to navigate the literature and discuss cross-cutting and application-specific challenges and opportunities.
Structured Neural Summarization
This work develops a framework to extend existing sequence encoders with a graph component that can reason about long-distance relationships in weakly structured data such as text and shows that the resulting hybrid sequence-graph models outperform both pure sequence models as well as pure graph models on a range of summarization tasks.
Deep learning type inference
- V. Hellendoorn, C. Bird, Earl T. Barr, Miltiadis Allamanis
- Computer ScienceESEC/SIGSOFT FSE
- 26 October 2018
DeepTyper is proposed, a deep learning model that understands which types naturally occur in certain contexts and relations and can provide type suggestions, which can often be verified by the type checker, even if it could not infer the type initially.
Learning natural coding conventions
NATHURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency is presented, which builds on recent work in applying statistical natural language processing to source code.