Competition-Level Code Generation with AlphaCode

  title={Competition-Level Code Generation with AlphaCode},
  author={Yujia Li and David H. Choi and Junyoung Chung and Nate Kushman and Julian Schrittwieser and R{\'e}mi Leblond and Tom and Eccles and James Keeling and Felix Gimeno and Agustin Dal Lago and Thomas Hubert and Peter Choy and Cyprien de and Masson d’Autume and Igor Babuschkin and Xinyun Chen and Po-Sen Huang and Johannes Welbl and Sven Gowal and Alexey and Cherepanov and James Molloy and Daniel Jaymin Mankowitz and Esme Sutherland Robson and Pushmeet Kohli and Nando de and Freitas and Koray Kavukcuoglu and Oriol Vinyals},
Yujia Li*, David Choi*, Junyoung Chung*, Nate Kushman*, Julian Schrittwieser*, Rémi Leblond*, Tom Eccles*, James Keeling*, Felix Gimeno*, Agustin Dal Lago*, Thomas Hubert*, Peter Choy*, Cyprien de Masson d’Autume*, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu and Oriol Vinyals *Joint first authors 

An Empirical Study of Code Smells in Transformer-based Code Generation Techniques

To investigate to what extent code smells are present in the datasets of coding generation techniques and verify whether they leak into the output of these techniques, Pylint and Bandit were used.

Natural Language to Code Translation with Execution

This work introduces execution result– based minimum Bayes risk decoding (MBR-EXEC) for program selection and shows that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks, suggesting it as an effective approach for natural language to code translation.

PanGu-Coder: Program Synthesis with Function-Level Language Modeling

A pretrained decoder-only language model adopting the P AN G U - α architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description is presented.

Improving automatically generated code from Codex via Automated Program Repair

This study systematically study whether automated program repair (APR) techniques can fix the incorrect solutions produced by language models in LeetCode contests, revealing that automatically generated codes share some common programming mistakes with human-crafted solutions, indicating existing APR tools have the potential to fix auto-generated code.

An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

An empirical study to conduct code similarities and performance differences between AlphaCode-generated codes and human codes shows that the generated codes from AlphaCode are similar to human codes and the generated code performs on par with or worse than the human code in terms of execution time and memory usage.

SecurityEval Dataset: Mining Vulnerability Examples to Evaluate Machine Learning-Based Code Generation Techniques

SecurityEval is described, an evaluation dataset that contains 130 samples for 75 vulnerability types, which are mapped to the Common Weakness Enumeration (CWE) and demonstrated using one open-source and one closed-source code generation model.

CCTEST: Testing and Repairing Code Completion Systems

This research proposes CCT EST, a framework to test and repair code completion systems in blackbox settings, which features a novel mutation strategy, namely program structure-consistency (PSC) mutations, to generate mutated code completion inputs.

MP-CodeCheck: Evolving Logical Expression Code Anomaly Learning with Iterative Self-Supervision

This work presents MP-CodeCheck, an MP system that tries to identify anomalous code patterns within logical program expressions and compares it against ControlFlag, a state-of-the-art self-supervised code anomaly detection system; it is found that MPCC is more spatially and temporally efficient.

Understanding High-Level Properties of Low-Level Programs Through Transformers

It is shown that Transformer models can translate C to LLVM-IR with high accuracy, by training on a parallel corpus of functions extract from 1 million compilable, open-sourced C programs and its corresponding LL VM-IR after compiling with Clang.

InCoder: A Generative Model for Code Infilling and Synthesis

INCODER is introduced, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling) and the ability to condition on bidirectional context substantially improves performance on challenging tasks such as type inference, comment generation, and variable re-naming.



Evaluating Large Language Models Trained on Code

It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.

The rust language

Rust's static type system is safe1 and expressive and provides strong guarantees about isolation, concurrency, and memory safety, and Rust's type system and runtime guarantee the absence of data races, buffer overflows, stack overflows and accesses to uninitialized or deallocated memory.

Code completion with statistical language models

The main idea is to reduce the problem of code completion to a natural-language processing problem of predicting probabilities of sentences, and design a simple and scalable static analysis that extracts sequences of method calls from a large codebase, and index these into a statistical language model.

How Program History Can Improve Code Completion

  • R. RobbesMichele Lanza
  • Computer Science
    2008 23rd IEEE/ACM International Conference on Automated Software Engineering
  • 2008
A benchmark measuring the accuracy and usefulness of a code completion engine is defined and an alternative interface for completion tools is proposed, which helps improve the results offered by code completion tools.

Latent Predictor Networks for Code Generation

A novel neural network architecture is presented which generates an output sequence conditioned on an arbitrary number of input functions and allows both the choice of conditioning context and the granularity of generation, for example characters or tokens, to be marginalised, thus permitting scalable and effective training.

Program Synthesis with Large Language Models

The limits of the current generation of large language models for program synthesis in general purpose programming languages are explored, finding that even the best models are generally unable to predict the output of a program given a specific input.

Measuring Coding Challenge Competence With APPS

APPS is introduced, a benchmark for code generation that measures the ability of models to take an arbitrary natural language specification and generate satisfactory Python code and shows that machine learning models are now beginning to learn how to code.

Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product

Google deployed several TPU generations since 2015, teaching lessons that changed their views: semi-conductor technology advances unequally, compiler compatibility trumps binary compatibility, especially for VLIW domain-specific architectures (DSA), and backwards ML compatibility helps deploy DNNs quickly.

PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers

This work introduces PyMT5, the Python method text-to-text transfer transformer, which is trained to translate between all pairs of Python method feature combinations: a single model that can both predict whole methods from natural language documentation strings (docstrings) and summarize code into docstrings of any common style.

Learning Autocompletion from Real-World Datasets

  • Gareth Ari AyeSeohyun KimHongyu Li
  • Computer Science
    2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
  • 2021
This study characterizes a large corpus of logged autocompletion usages to investigate why training on real-world examples leads to stronger models.