Code Summarization with Structure-induced Transformer

  title={Code Summarization with Structure-induced Transformer},
  author={Hongqi Wu and Hai Zhao and Min Zhang},
Code summarization (CS) is becoming a promising area in recent language understanding, which aims to generate sensible human language automatically for programming language in the format of source code, serving in the most convenience of programmer developing. It is well known that programming languages are highly structured. Thus previous works attempt to apply structurebased traversal (SBT) or non-sequential models like Tree-LSTM and graph neural network (GNN) to learn structural program… 

Code Structure Guided Transformer for Source Code Summarization

This paper proposes a novel approach named SG-Trans to incorporate code structural properties into Transformer, which injects the local symbolic information and global syntactic structure into the self-attention module of Transformer as inductive bias to capture the hierarchical characteristics of code.

Understanding Long Programming Languages with Structure-Aware Sparse Attention

This paper presents SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks, and introduces AST structures into attention.

Automatically Generating Code Comment Using Heterogeneous Graph Neural Networks

  • Dun JinPeiyu LiuZhenfang Zhu
  • Computer Science
    2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)
  • 2022
This paper proposes a new approach named CCHG to automatically generate code comments to fully learn structural information and sequence information from code snippets, and proposes heterogeneous graph networks to process the sentence-level and token-level code.

Assemble Foundation Models for Automatic Code Summarization

This work assembles available foundation models, such as CodeBERT and GPT-2, into a single model named AdaMo, and utilizes Gaussian noise as the simulation of contextual information to optimize the latent representation.

API + code = better code summary? insights from an exploratory study

The results show that although API information is helpful for code summarization, the overall performance does not improve compared with the state-of-the-art approach using transformers, and there is immense scope for further research focusing on improving models and leveraging additional API knowledge for code summaryization.

Boosting Code Summarization by Embedding Code Structures

It is found that a program dependency graph (PDG) can represent the structure of a code more effectively and improve the performance of SBERT score, which implies that models implemented with PBM generate summaries that are semantically more similar to the reference summary.

An Extractive-and-Abstractive Framework for Source Code Summarization

A novel extractive-and-abstractive framework to generate human-written-like summaries with preserved factual details, called EACS, which significantly outperforms state-of-the-art techniques in terms of all three widely used metrics.

GypSum: Learning Hybrid Representations for Code Summarization

GypSum is a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model and demonstrates the superior performance of GypSum over existing code summarization models.

On the Evaluation of Neural Code Summarization

A systematic and in-depth analysis of 5 state-of-the-art neural code summarization models on 6 widely used BLEU variants, 4 pre-processing operations and their combinations, and 3 widely used datasets shows that some important factors have a great influence on the model evaluation, especially on the performance of models and the ranking among the models.

SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations

Three pre-training tasks are introduced that are specifically designed to enable SPT-Code to learn knowledge of source code, the corresponding code structure, as well as a natural language description of the code without relying on any bilingual corpus, and eventually exploit these three sources of information when it is applied to downstream tasks.



Automatic Code Summarization via Multi-dimensional Semantic Fusing in GNN

This paper proposes a retrieval-augmented mechanism to augment source code semantics with external knowledge to better learn semantics from the joint graph, and proposes a novel attention-based dynamic graph to capture global interactions among nodes in the static graph.

Automatic Source Code Summarization with Extended Tree-LSTM

This work proposes an extension of Tree-LSTM, which it is proposed as a generalization of LSTMs for tree-structured data and applies it for source code summarization, achieving better results when compared with several state-of-the-art techniques.

A Transformer-based Approach for Source Code Summarization

This work explores the Transformer model that uses a self-attention mechanism and has shown to be effective in capturing long-range dependencies in source code summarization, and shows that despite the approach is simple, it outperforms the state-of-the-art techniques by a significant margin.

DeepSumm - Deep Code Summaries using Neural Transformer Architecture

Neural techniques are employed to solve the task of source code summarizing and specifically NMT based techniques are compared to more simplified and appealing Transformer architecture on a dataset of Java methods and comments.

Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual Learning

A novel end-to-end model is proposed that can significantly improve the results of the code retrieval task over the-state-of-art models, as well as achieve competitive performance in terms of BLEU score for the code summarization task.

Improved Code Summarization via a Graph Neural Network

This paper presents an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries, and shows improvement over four baseline techniques.

Summarizing Source Code with Transferred API Knowledge

Experiments on large-scale real-world industry Java projects indicate that the proposed novel approach, named TL-CodeSum, is effective and outperforms the state-of-the-art in code summarization.

Improving Automatic Source Code Summarization via Deep Reinforcement Learning

  • Yao WanZhou Zhao Philip S. Yu
  • Computer Science
    2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)
  • 2018
An abstract syntax tree structure as well as sequential content of code snippets into a deep reinforcement learning framework (i.e., actor-critic network) which provides the confidence of predicting the next word according to current state and an advantage reward composed of BLEU metric to train both networks.

A Neural Model for Generating Natural Language Summaries of Program Subroutines

This paper presents a neural model that combines words from code with code structure from an AST, which allows the model to learn code structure independent of the text in code.

Summarizing Source Code using a Neural Attention Model

This paper presents the first completely datadriven approach for generating high level summaries of source code, which uses Long Short Term Memory (LSTM) networks with attention to produce sentences that describe C# code snippets and SQL queries.