What is it like to program with artificial intelligence?

@article{Sarkar2022WhatII,
  title={What is it like to program with artificial intelligence?},
  author={Advait Sarkar and Andrew D. Gordon and Carina Negreanu and Christian Poelitz and Sruti Srinivasa Ragavan and Benjamin G. Zorn},
  journal={ArXiv},
  year={2022},
  volume={abs/2208.06213}
}
Large language models, such as OpenAI’s codex and Deepmind’s AlphaCode, can generate code to solve a variety of problems expressed in natural language. This technology has already been commercialised in at least one widely-used programming editor extension: GitHub Copilot. In this paper, we explore how programming with large language models (LLM-assisted programming) is similar to, and differs from, prior conceptualisations of programmer assistance. We draw upon publicly available experience… 

Figures from this paper

Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming

This work studied GitHub Copilot, developed CUPS– a taxonomy of 12 programmer activities common to AI code completion systems, and conducted a study with 21 programmers who completed coding tasks and used the labeling tool to retrospectively label their sessions with CUPS.

Grounded Copilot: How Programmers Interact with Code-Generating Models

Interactions with programming assistants are bimodal : in acceleration mode , the programmer knows what to do next and uses Copilot to get there faster; in exploration mode, the programmer is unsure how to proceed and usesCopilot to explore their options.

References

SHOWING 1-10 OF 97 REFERENCES

Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions

This work systematically investigates the prevalence and conditions that can cause GitHub Copilot to recommend insecure code, and explores Copilot’s performance on three distinct code generation axes—examining how it performs given diversity of weaknesses, diversity of prompts, and diversity of domains.

Language Models are Few-Shot Learners

GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.

Program Synthesis with Large Language Models

The limits of the current generation of large language models for program synthesis in general purpose programming languages are explored, finding that even the best models are generally unable to predict the output of a program given a specific input.

The effectiveness of pair programming: A meta-analysis

How Readable is Model-generated Code? Examining Readability and Visual Inspection of GitHub Copilot

The results suggest that model generated code is comparable in complexity and readability to code written by human pair programmers, and eye tracking data suggests, to a statistically significant level, that programmers direct less visual attention to model generate code.

End-user encounters with lambda abstraction in spreadsheets: Apollo’s bow or Achilles’ heel?

—The value of computational abstractions to non- expert end-user programmers is contentious. We study reactions to the LAMBDA function in Microsoft Excel, which enables users to define their own

Grounded Copilot: How Programmers Interact with Code-Generating Models

Interactions with programming assistants are bimodal : in acceleration mode , the programmer knows what to do next and uses Copilot to get there faster; in exploration mode, the programmer is unsure how to proceed and usesCopilot to explore their options.

Is GitHub Copilot a Substitute for Human Pair-programming? An Empirical Study

  • Saki Imai
  • Computer Science
    2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion)
  • 2022
The results suggest that although Copilot increases productivity as measured by Lines of code added, the quality of code produced is inferior by having more lines of code deleted in the subsequent trial.

Productivity assessment of neural code completion

It is found that the rate with which shown suggestions are accepted, rather than more specific metrics regarding the persistence of completions in the code over time, drives developers’ perception of productivity.

Discovering the Syntax and Strategies of Natural Language Programming with Generative Language Models

A natural language code synthesis tool, GenLine, backed by a large generative language model and a set of task-specific prompts that create or change code is presented, indicating that while naturallanguage code synthesis can sometimes provide a magical experience, participants still faced challenges.
...