Competition-level code generation with AlphaCode

@article{Li2022CompetitionlevelCG,
  title={Competition-level code generation with AlphaCode},
  author={Yujia Li and David H. Choi and Junyoung Chung and Nate Kushman and Julian Schrittwieser and R{\'e}mi Leblond and Tom and Eccles and James Keeling and Felix Gimeno and Agustin Dal Lago and Thomas Hubert and Peter Choy and Cyprien de and Masson d’Autume and Igor Babuschkin and Xinyun Chen and Po-Sen Huang and Johannes Welbl and Sven Gowal and Alexey and Cherepanov and James Molloy and Daniel Jaymin Mankowitz and Esme Sutherland Robson and Pushmeet Kohli and Nando de and Freitas and Koray Kavukcuoglu and Oriol Vinyals},
  journal={Science},
  year={2022},
  volume={378},
  pages={1092 - 1097}
}
Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models show impressive code generation abilities yet still perform poorly on more complex tasks requiring problem-solving skills, such as competitive programming problems. Here, we introduce AlphaCode, a system for code generation that achieved an average ranking in… 

Automated Feedback Generation for Competition-Level Code

This work presents Clef, the first data-driven tool that can generate feedback on competition-level code automatically by repairing programmers’ incorrect submissions, and introduces a new data structure, merge trees, to capture the changes between submissions.

AlphaCode and “data-driven” programming

The AlphaCode system is presented, which represents a substantial step forward in the development of machine learning models that can synthesize computer programs to solve these types of challenging problems, and what is perhaps most surprising about the system is what AlphaCode does not do: it contains no explicit built-in knowledge about the structure of computer code.

An Empirical Evaluation of Competitive Programming AI: A Case Study of AlphaCode

An empirical study to find code similarities and performance differences between AlphaCode-generated codes and human codes shows that the generated codes are similar to human codes and the generated code performs on par with or worse than the human code in terms of execution time and memory usage.

Language Models Can Teach Themselves to Program Better

This work shows how generating synthetic programming puzzles and solutions, verified for correctness by a Python interpreter, can be used to improve performance in solving test puzzles from P3, a public benchmark set of Python Programming Puzzles.

Improving automatically generated code from Codex via Automated Program Repair

This study systematically study whether automated program repair (APR) techniques can fix the incorrect solutions produced by language models in LeetCode contests, revealing that automatically generated codes share some common programming mistakes with human-crafted solutions, indicating existing APR tools have the potential to fix auto-generated code.

Fault-Aware Neural Code Rankers

C ODE R ANKER is a neural ranker that can predict the correctness of a sampled program without executing it and can significantly increase the pass@1 accuracy of various code generation models on APPS, HumanEval, and MBPP datasets.

Automated Repair of Programs from Large Language Models

The study revealed that automatically generated code shares common programming mistakes with human-crafted solutions, indicating APR techniques may have potential to auto-generated code, and given bug location information provided by a statistical fault localization approach, Codex edit mode is similar to or better than existing Java repair tools TBar and Recoder in correcting incorrect solutions.

Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic?

Investigation of the various input parameters of two language models shows that varying the input parameters can improve the performance of language models, but there is a tight dependency when varying the temperature, the prompt and the number of generated solutions, making potentially hard to properly control the parameters to obtain an optimal result.

Automated Repair of Code from Language Models

The study revealed that automatically generated code shares common programming mistakes with human-crafted solutions, indicating APR techniques may have potential to auto-generated code, and given bug location information provided by a statistical fault localization approach, Codex edit mode is similar to or better than existing Java repair tools TBar and Recoder in correcting incorrect solutions.

Using GitHub Copilot to Solve Simple Programming Problems

Evaluating Copilot, a natural language machine learning model trained on billions of lines of code, and looking qualitatively at the generated suggestions, to understand the limitations of Copilot.
...

References

SHOWING 1-10 OF 81 REFERENCES

Neural Sketch Learning for Conditional Program Generation

This work trains a neural generator not on code but on program sketches, or models of program syntax that abstract out names and operations that do not generalize across programs, and shows that it can often predict the entire body of a method given just a few API calls or data types that appear in the method.

Program Synthesis with Large Language Models

The limits of the current generation of large language models for program synthesis in general purpose programming languages are explored, finding that even the best models are generally unable to predict the output of a program given a specific input.

Evaluating Large Language Models Trained on Code

It is found that repeated sampling from the GPT language model is a surprisingly effective strategy for producing working solutions to difficult prompts, and the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics are discussed.

Measuring Coding Challenge Competence With APPS

APPS is introduced, a benchmark for code generation that measures the ability of models to take an arbitrary natural language specification and generate satisfactory Python code and shows that machine learning models are now beginning to learn how to code.

IntelliCode compose: code generation using transformer

IntelliCode Compose is introduced – a general-purpose multilingual code completion tool which is capable of predicting sequences of code tokens of arbitrary types, generating up to entire lines of syntactically correct code.

Solving Probability and Statistics Problems by Program Synthesis

This work is the first to introduce a new dataset of university-level probability and statistics problems and solve these problems in a scalable fashion using the program synthesis capabilities of large language models.

Learning Autocompletion from Real-World Datasets

  • Gareth Ari AyeSeohyun KimHongyu Li
  • Computer Science
    2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
  • 2021
This study characterizes a large corpus of logged autocompletion usages to investigate why training on real-world examples leads to stronger models.

RobustFill: Neural Program Learning under Noisy I/O

This work directly compares both approaches for automatic program learning on a large-scale, real-world learning task and demonstrates that the strength of each approach is highly dependent on the evaluation metric and end-user application.

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

This work develops CodeBERT with Transformer-based neural architecture, and trains it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators.

Learning from examples to improve code completion systems

Evidence is given that intelligent code completion systems which learn from examples significantly outperform mainstream codepletion systems in terms of the relevance of their suggestions and thus have the potential to enhance developers' productivity.
...