Toward human-like summaries generated from heterogeneous software artefacts

  title={Toward human-like summaries generated from heterogeneous software artefacts},
  author={Mahfouth Alghamdi and Christoph Treude and Markus Wagner},
  journal={Proceedings of the Genetic and Evolutionary Computation Conference Companion},
Automatic text summarisation has drawn considerable interest in the field of software engineering. It can improve the efficiency of software developers, enhance the quality of products, and ensure timely delivery. In this paper, we present our initial work towards automatically generating human-like multi-document summaries from heterogeneous software artefacts. Our analysis of the text properties of 545 human-written summaries from 15 software engineering projects will ultimately guide… 

Figures from this paper

Human-Like Summaries from Heterogeneous and Time-Windowed Software Development Artefacts
This work presents the first framework for summarising multi-document software artefacts containing heterogeneous data within a given time frame and employs a range of iterative heuristics to minimise the cosine-similarity between texts and high-dimensional feature vectors.
A Survey on Deep Learning based Various Methods Analysis of Text Summarization
This paper has made an attempt to study the various methods that are used for text SUMZ and observe the trends, the developments, the accomplishments and the explore new dimensions for future work to be done in this expanding field.


Towards automatically generating summary comments for Java methods
A novel technique to automatically generate descriptive summary comments for Java methods is presented, given the signature and body of a method, which identifies the content for the summary and generates natural language text that summarizes the method's overall actions.
Automatic generation of natural language summaries for Java classes
This paper presents a technique to automatically generate human readable summaries for Java classes, assuming no documentation exists, and determines that they are readable and understandable, they do not include extraneous information, and, in most cases, they are not missing essential information.
Automatic Summarization of Bug Reports
It is found that summaries helped the study participants save time, that there was no evidence that accuracy degraded when summaries were used and that most participants preferred working with summaries to working with original bug reports.
Text Summarization Techniques: A Brief Survey
The main approaches to automatic text summarization are described and the effectiveness and shortcomings of the different methods are described.
Summarizing and measuring development activity
It is found that unexpected events are as important as expected events in summaries of what a developer did, and that many developers do not believe in measuring development activity.
Discovering essential code elements in informal documentation
A novel traceability recovery approach to extract the code elements contained in various documents that does not require an index of code elements to find links, which makes it particularly well-suited for the analysis of informal documentation.
The Challenges of Automatic Summarization
Researchers are investigating summarization tools and methods that automatically extract or abstract content from a range of information sources, including multimedia, looking at approaches which roughly fall into two categories: knowledge-poor and knowledge-rich.
The Automatic Creation of Literature Abstracts
In the exploratory research described, the complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program.
Social coding in GitHub: transparency and collaboration in an open software repository
It is found that people make a surprisingly rich set of social inferences from the networked activity information in GitHub, such as inferring someone else's technical goals and vision when they edit code, or guessing which of several similar projects has the best chance of thriving in the long term.
Visualizing Data using t-SNE
A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.