• Publications
  • Influence
A Neural Model for Generating Natural Language Summaries of Program Subroutines
TLDR
This paper presents a neural model that combines words from code with code structure from an AST, which allows the model to learn code structure independent of the text in code.
Automatically generating commit messages from diffs using neural machine translation
TLDR
This paper adapts Neural Machine Translation (NMT) to automatically "translate" diffs into commit messages and designed a quality-assurance filter to detect cases in which the algorithm is unable to produce good messages, and return a warning instead.
Improved Code Summarization via a Graph Neural Network
TLDR
This paper presents an approach that uses a graph-based neural architecture that better matches the default structure of the AST to generate these summaries, and shows improvement over four baseline techniques.
Portfolio: finding relevant functions and their usage
TLDR
The results show with strong statistical significance that users find more relevant functions with higher precision with Portfolio than with Google Code Search and Koders.
Detecting similar software applications
TLDR
The results show with strong statistical significance that CLAN automatically detects similar applications from a large repository of 8,310 Java applications with a higher precision than MUDABlue.
Automatic documentation generation via source code summarization of method context
TLDR
This paper proposes a technique that includes this context by analyzing how the Java methods are invoked, and finds that programmers benefit from the generated documentation because it includes context information.
Automatic Source Code Summarization of Context for Java Methods
TLDR
A source code summarization technique that writes English descriptions of Java methods by analyzing how those methods are invoked is proposed and found that while it does not reach the quality of human-written summaries, it does improve over the state-of-the-art summarization tool in several dimensions by a statistically-significant margin.
On using machine learning to automatically classify software applications into domain categories
TLDR
A new approach is proposed that makes it possible to categorize software projects without any source code using a small number of API calls as attributes, and furthermore a comprehensive empirical evaluation of automatic categorization approaches is carried out.
When and How Using Structural Information to Improve IR-Based Traceability Recovery
TLDR
This paper proposes to use the feedback provided by the software engineer when classifying candidate links to regulate the effect of using structural information, and suggests that this approach outperforms both a pure IR-based method and a simple approach for combining textual and structural information.
An empirical investigation into a large-scale Java open source code repository
TLDR
This work poses 32 research questions, explains rationale behind them, and obtains facts from 2,080 randomly chosen Java applications from Sourceforge that find that most methods have one or zero arguments or they do not return any values.
...
1
2
3
4
5
...