A Survey of Machine Learning for Big Code and Naturalness

@article{Allamanis2018ASO,
  title={A Survey of Machine Learning for Big Code and Naturalness},
  author={Miltiadis Allamanis and Earl T. Barr and Premkumar T. Devanbu and Charles Sutton},
  journal={ACM Computing Surveys (CSUR)},
  year={2018},
  volume={51},
  pages={1 - 37}
}
Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit the abundance of patterns of code. [...] Key Method We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges…Expand
306 Citations
ComPy-Learn: A toolbox for exploring machine learning representations for compilers
Neural Networks for Modeling Source Code Edits
  • 7
  • PDF
Learning to Represent Programs with Graphs
  • 276
  • PDF
Neural Code Comprehension: A Learnable Representation of Code Semantics
  • 57
  • PDF
Neural Code Comprehension : A Learnable Representation of Code Semantics
  • PDF
Machine Learning in Compilers
  • 3
  • PDF
Commit2Vec: Learning Distributed Representations of Code Changes
  • 2
  • PDF
Machine Learning in Compiler Optimization
  • 78
  • PDF
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 31 REFERENCES
Structured Generative Models of Natural Source Code
  • 121
  • Highly Influential
  • PDF
Learning to Fuzz: Application-Independent Fuzz Testing with Probabilistic, Generative Models of Input Data
  • 29
  • Highly Influential
  • PDF
PHOG: Probabilistic Model for Code
  • 117
  • Highly Influential
  • PDF
Code completion with statistical language models
  • 389
  • Highly Influential
  • PDF
Deep Learning to Find Bugs
  • 20
  • Highly Influential
  • PDF
Predicting Program Properties from "Big Code"
  • 272
  • Highly Influential
  • PDF
Graph-Based Statistical Language Model for Code
  • A. Nguyen, T. Nguyen
  • Computer Science
  • 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering
  • 2015
  • 124
  • Highly Influential
  • PDF
Summarizing Source Code using a Neural Attention Model
  • 245
  • Highly Influential
  • PDF
NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation
  • 89
  • Highly Influential
  • PDF
Bayesian specification learning for finding API usage errors
  • 33
  • Highly Influential
  • PDF
...
1
2
3
4
...