Using Pre-Trained Models to Boost Code Review Automation

  title={Using Pre-Trained Models to Boost Code Review Automation},
  author={Rosalia Tufano and Simone Masiero and Antonio Mastropaolo and Luca Pascarella and Denys Poshyvanyk and Gabriele Bavota},
  journal={2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)},
Code review is a practice widely adopted in open source and industrial projects. Given the non-negligible cost of such a process, researchers started investigating the possibility of automating specific code review tasks. We recently proposed Deep Learning (DL) models targeting the automation of two tasks: the first model takes as input a code submitted for review and implements in it changes likely to be recommended by a reviewer; the second takes as input the submitted code and a reviewer… 

Figures and Tables from this paper

AUGER: Automatically Generating Review Comments with Pre-training Models

This paper proposes AUGER ( AU tomatically GE nerating R eview comments): a review comments generator with pre-training models that synthesizes valuable knowledge in the training stage and effectively outperforms baselines by 37.38% in ROUGE-L.

CodeReviewer: Pre-Training for Automating Code Review Activities

This research proposes CodeReviewer, a pre-trained model that utilizes four pre-training tasks tailored specifically for the code review senario, and establishes a high-quality benchmark dataset based on the collected data for these three tasks.

CoditT5: Pretraining for Source Code and Natural Language Editing

A novel pretraining objective is proposed which explicitly models edits and used to build CoditT5, a large language model for software-related editing tasks that is pretrained on large amounts of source code and natural language comments.



Towards Automating Code Review Activities

The goal is to make the first step towards partially automating the code review process, thus, possibly reducing the manual costs associated with it, by training two different Deep Learning architectures.

Towards automating code review at scale

This work examines the vision and challenges of automating code review at realistic scale, and focuses on predicting just the locations of comments, which are quite rare.

Automatic Code Review by Learning the Revision of Source Code

Experimental results on six open source software projects indicate by learning the revision features, DACE can outperform the competing approaches in automatic code review.

Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

This paper empirically investigated how the T5 model performs when pre-trained and fine-tuned to support code-related tasks, and compared the performance of this single model with the results reported in the four original papers proposing DL-based solutions for those four tasks.

On Multi-Modal Learning of Editing Source Code

Modit, a multi-modal NMT based code editing engine, shows that developers’ hint as an input modality can narrow the search space for patches and outperform state-of-the-art models to generate correctly patched code in top-1 position.

Unit Test Case Generation with Transformers

This paper proposes AthenaTest, an approach that aims at generating unit test cases by learning from real-world, developer-written test cases, relying on a state-of-the-art sequence-to-sequence transformer model which is able to write useful test cases for a given method under test.

DeepJIT: An End-to-End Deep Learning Framework for Just-in-Time Defect Prediction

This paper proposes an end-to-end deep learning framework, named DeepJIT, that automatically extracts features from commit messages and code changes and use them to identify defects.

An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation

An empirical study to assess the feasibility of using Neural Machine Translation techniques for learning bug-fixing patches for real defects finds that such a model is able to fix thousands of unique buggy methods in the wild.

A Systematic Literature Review on the Use of Deep Learning in Software Engineering Research

A systematic literature review of research at the intersection of SE & DL, from its modern inception to the present, that delineates the foundations of DL techniques applied to SE research and highlights likely areas of fertile exploration for the future.

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more.