Suggesting accurate method and class names

@article{Allamanis2015SuggestingAM,
  title={Suggesting accurate method and class names},
  author={Miltiadis Allamanis and Earl T. Barr and Christian Bird and Charles Sutton},
  journal={Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering},
  year={2015}
}
Descriptive names are a vital part of readable, and hence maintainable, code. Recent progress on automatically suggesting names for local variables tantalizes with the prospect of replicating that success with method and class names. However, suggesting names for methods and classes is much more difficult. This is because good method and class names need to be functionally descriptive, but suggesting such names requires that the model goes beyond local context. We introduce a neural… 

Figures and Tables from this paper

Suggesting Natural Method Names to Check Name Consistencies
TLDR
MNire, a machine learning approach to check the consistency between the name of a given method and its implementation, is introduced and used to detect inconsistent methods and suggest new names in several active, GitHub projects, showing MNire'S usefulness.
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning
TLDR
The proposed Varclr, a new approach for learning semantic representations of variable names that effectively captures variable similarity in this stricter sense, enables the effective application of sophisticated, general-purpose language models like BERT, to variable name representation and thus also to related downstream tasks like variable name similarity search or spelling correction.
Learning to Recommend Method Names with Global Context
TLDR
GTNM, a Global Transformer-based Neural Model for method name suggestion, which considers the local context, the project-specific context, and the documentation of the method simultaneously simultaneously is proposed.
A Neural Model for Method Name Generation from Functional Description
TLDR
A neural network is proposed to directly generate readable method names from natural language description to handle the explosion of vocabulary when dealing with large repositories, and how to leverage the knowledge learned from large repositories to a specific project.
IdBench: Evaluating Semantic Representations of Identifier Names in Source Code
TLDR
This paper uses IdBench, the first benchmark for evaluating semantic representations against a ground truth created from thousands of ratings by 500 software developers, to study state-of-the-art embedding techniques proposed for natural language, an embedding technique specifically designed for source code, and lexical string distance functions.
Towards a Naming Quality Model
TLDR
Initial results show that the combination of a rule-based approach and a deep learning model can correctly indicate what names need attention and the combined combination of syntactic and semantic information yields better results than either of them by themselves.
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Learning Approaches
TLDR
This work focuses on method names and study how a descriptive name can be automatically generated from a method’s body and experiment with two approaches from the field of text summarization: One based on TF-IDF and the other on deep recurrent neural network.
Nomen est Omen: Exploring and Exploiting Similarities between Argument and Parameter Names
TLDR
An empirical study of the lexical similarity between arguments and parameters of methods, which is one prominent situation where names can provide otherwise missing information, finds that, for most arguments, the similarity is either very high or very low, and that short and generic names often cause low similarities.
Deep Generation of Coq Lemma Names Using Elaborated Terms
TLDR
These models, based on multi-input neural networks, are the first to leverage syntactic and semantic information from Coq ’s lexer, parser, and kernel for naming; the key insight is that learning from elaborated terms can substantially boost model performance.
...
...

References

SHOWING 1-10 OF 53 REFERENCES
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
A statistical semantic language model for source code
TLDR
SLAMC is introduced, a novel statistical semantic language model for source code that incorporates semantic information into code tokens and models the regularities/patterns of such semantic annotations, called sememes, rather than their lexemes.
Linguistic Regularities in Continuous Space Word Representations
TLDR
The vector-space word representations that are implicitly learned by the input-layer weights are found to be surprisingly good at capturing syntactic and semantic regularities in language, and that each relationship is characterized by a relation-specific vector offset.
A fast and simple algorithm for training neural probabilistic language models
TLDR
This work proposes a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly introduced procedure for estimating unnormalized continuous distributions and demonstrates the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary.
Debugging Method Names
TLDR
This paper shows that naming conventions can go much further: it can mechanically check whether or not a method name and implementation are likely to be good matches for each other, and presents an approach for automatic suggestion of more suitable names in the presence of mismatch betweenName and implementation.
Predicting Program Properties from "Big Code"
TLDR
This work formulating the problem of inferring program properties as structured prediction and showing how to perform both learning and inference in this context opens up new possibilities for attacking a wide range of difficult problems in the context of "Big Code" including invariant generation, decompilation, synthesis and others.
On the Use of Automated Text Summarization Techniques for Summarizing Source Code
TLDR
The paper presents a solution which mitigates the two approaches, i.e., short and accurate textual descriptions that illustrate the software entities without having to read the details of the implementation.
Using IR methods for labeling source code artifacts: Is it worthwhile?
TLDR
Results indicate that clustering-based approaches (LSI and LDA) are much more worthwhile to be used on source code artifacts having a high verbosity, as well as for artifacts requiring more effort to be manually labeled.
Learning natural coding conventions
TLDR
NATHURALIZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency is presented, which builds on recent work in applying statistical natural language processing to source code.
REPENT: Analyzing the Nature of Identifier Renamings
TLDR
This paper proposes REanaming Program ENTities (REPENT), an approach to automatically document-detect and classify-identifier renamings in source code, and evaluates the accuracy and completeness of REPENT on the evolution history of five open-source Java programs.
...
...