Corpus ID: 234790100

Measuring Coding Challenge Competence With APPS

@article{Hendrycks2021MeasuringCC,
  title={Measuring Coding Challenge Competence With APPS},
  author={Dan Hendrycks and Steven Basart and Saurav Kadavath and Mantas Mazeika and Akul Arora and Ethan Guo and Collin Burns and Samir Puranik and Horace He and D. Song and J. Steinhardt},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.09938}
}
While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. It can be difficult to accurately assess code generation performance, and there has been surprisingly little work on evaluating code generation in a way that is both flexible and rigorous. To meet this challenge, we introduce APPS, a benchmark for code generation. Unlike prior work in more restricted settings, our benchmark measures the… Expand

Figures and Tables from this paper

CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
Evaluating Large Language Models Trained on Code

References

SHOWING 1-10 OF 45 REFERENCES
Deep Learning Based Program Generation from Requirements Text: Are We There Yet?
Mapping Language to Code in Programmatic Context
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Mining source code repositories at massive scale using language modeling
Latent Predictor Networks for Code Generation
Probabilistic model for code with decision trees
Generative Language Modeling for Automated Theorem Proving
SPoC: Search-based Pseudocode to Code
Learning to Generate Pseudo-Code from Source Code Using Statistical Machine Translation (T)
Measuring Mathematical Problem Solving With the MATH Dataset
...
1
2
3
4
5
...