Contextualized Code Representation Learning for Commit Message Generation
@article{Nie2020ContextualizedCR, title={Contextualized Code Representation Learning for Commit Message Generation}, author={Lun Yiu Nie and Cuiyun Gao and Zhicong Zhong and Wai Lam and Yang Liu and Zenglin Xu}, journal={Neurocomputing}, year={2020}, volume={459}, pages={97-107} }
Figures and Tables from this paper
13 Citations
A large-scale empirical study of commit message generation: models, datasets and evaluation
- Computer ScienceEmpirical Software Engineering
- 2022
This paper conducts a systematic and in-depth analysis of the state-of-the-art models and datasets for automatic commit message generation and collects a large-scale, information-rich, multi-programming-language, MCMD.
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
- Computer ScienceIEEE Transactions on Reliability
- 2022
DTrans is designed with dynamically relative position encoding in the multi-head attention of Transformer, which can more accurately generate patches than the state-of-the-art methods and locate the lines to change with higher accuracy than the existing methods.
FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation
- Computer Science2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)
- 2022
A novel commit message generation technique, FIRA, which first represents code changes via fine-grained graphs and then learns to generate commit messages automati-cally, which outperforms state-of-the-art techniques in terms of BLEU, ROUGE-L, and METEOR.
RACE: Retrieval-Augmented Commit Message Generation
- Computer Science
- 2022
RACE is proposed, a new retrieval-augmented neural commit message generation method, which treats the retrieved similar commit as an exemplar and leverages it to generate an accurate commit message.
What Makes a Good Commit Message?
- Computer Science2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)
- 2022
A taxonomy based on recurring patterns in commit messages' expressions is developed, investigating whether “good” commit messages can be automatically identified and whether such automation could prompt developers to write better commit messages.
Code Structure Guided Transformer for Source Code Summarization
- Computer ScienceACM Transactions on Software Engineering and Methodology
- 2022
This paper proposes a novel approach named SG-Trans to incorporate code structural properties into Transformer, which injects the local symbolic information and global syntactic structure into the self-attention module of Transformer as inductive bias to capture the hierarchical characteristics of code.
A Survey on Machine Learning Techniques for Source Code Analysis
- Computer ScienceArXiv
- 2021
semantic graph, CFG Token-based, path-
Jointly Learning to Repair Code and Generate Commit Message
- Computer ScienceEMNLP
- 2021
This work proposes a joint model that can both repair the program code and generate the commit message in a unified framework and enhances the cascaded method with different training approaches, including the teacher-student method, the multi-task method, and the back-translation method.
Disentangled Code Representation Learning for Multiple Programming Languages
- Computer ScienceFINDINGS
- 2021
The experimental results validate the superiority of the proposed disentangled code representation learning approach, compared to several baselines, across three types of downstream tasks, i.e., code clone detection, code translation, and code-to-code search.
CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model
- Computer ScienceNLP4PROG
- 2021
The work is to develop a model that automatically writes the commit message, and releases 345K datasets consisting of code modification and commit messages in six programming languages.
References
SHOWING 1-10 OF 46 REFERENCES
Generating Commit Messages from Diffs using Pointer-Generator Network
- Computer Science2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)
- 2019
PtrGNCMsg, a novel approach which is based on an improved sequence-to-sequence model with the pointer-generator network to translate code diffs into commit messages outperforms recent approaches based on neural machine translation, and first enables the prediction of OOV words.
Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?
- Computer Science2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)
- 2018
A simpler and faster approach is proposed, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm, which is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU by 21%.
Automatically generating commit messages from diffs using neural machine translation
- Computer Science2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)
- 2017
This paper adapts Neural Machine Translation (NMT) to automatically "translate" diffs into commit messages and designed a quality-assurance filter to detect cases in which the algorithm is unable to produce good messages, and return a warning instead.
A Transformer-based Approach for Source Code Summarization
- Computer ScienceACL
- 2020
This work explores the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies in source code summarization, and shows that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.
SCELMo: Source Code Embeddings from Language Models
- Computer ScienceArXiv
- 2020
It is shown that even a low-dimensional embedding trained on a relatively small corpus of programs can improve a state-of-the-art machine learning system for bug detection.
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
- Computer ScienceFINDINGS
- 2020
This work develops CodeBERT with Transformer-based neural architecture, and trains it with a hybrid objective function that incorporates the pre-training task of replaced token detection, which is to detect plausible alternatives sampled from generators.
Incorporating BERT into Neural Machine Translation
- Computer ScienceICLR
- 2020
A new algorithm named BERT-fused model is proposed, in which BERT is first used to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms.
Commit Message Generation for Source Code Changes
- Computer ScienceIJCAI
- 2019
This paper first extracts both code structure and code semantics from the source code changes, and then jointly model these two sources of information so as to better learn the representations of the code changes.
A Novel Neural Source Code Representation Based on Abstract Syntax Tree
- Computer Science2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)
- 2019
This paper proposes a novel AST-based Neural Network (ASTNN) for source code representation that splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements.
MASS: Masked Sequence to Sequence Pre-training for Language Generation
- Computer ScienceICML
- 2019
This work proposes MAsked Sequence to Sequence pre-training (MASS) for the encoder-decoder based language generation tasks, which achieves the state-of-the-art accuracy on the unsupervised English-French translation, even beating the early attention-based supervised model.