Annibale Panichella

Learn More
Information Retrieval (IR) methods, and in particular topic models, have recently been used to support essential software engineering (SE) tasks, by enabling software textual retrieval and analysis. In all these approaches, topic models have been used on software artifacts in a similar manner as they were used on natural language documents (e.g., using the(More)
Existing defect prediction models use product or process metrics and machine learning methods to identify defect-prone source code entities. Different classifiers (e.g., linear regression, logistic regression, or classification trees) have been investigated in the last decade. The results achieved so far are sometimes contrasting and do not show a clear(More)
Cross-project defect prediction is very appealing because (i) it allows predicting defects in projects for which the availability of data is limited, and (ii) it allows producing generalizable prediction models. However, existing research suggests that cross-project prediction is particularly challenging and, due to heterogeneity of projects, prediction(More)
The intensive human effort needed to manually manage traceability information has increased the interest in utilising semi-automated traceability recovery techniques. This paper presents a simple way to improve the accuracy of traceability recovery methods based on Information Retrieval techniques. The proposed method acts on the artefact indexing(More)
Information Retrieval (IR) has been widely accepted as a method for automated traceability recovery based on the textual similarity among the software artifacts. However, a notorious difficulty for IR-based methods is that artifacts may be related even if they are not textually similar. A growing body of work addresses this challenge by combining IR-based(More)
Information Retrieval (IR) techniques have been used for various software engineering tasks, including the labeling of software artifacts by extracting “keywords” from them. Such techniques include Vector Space Models, Latent Semantic Indexing, Latent Dirichlet Allocation, as well as customized heuristics extracting words from specific source code elements.(More)
The research community in Software Engineering and Software Testing in particular builds many of its contributions on a set of mutually shared expectations. Despite the fact that they form the basis of many publications as well as open-source and commercial testing applications, these common expectations and beliefs are rarely ever questioned. For example,(More)
Information Retrieval methods have been largely adopted to identify traceability links based on the textual similarity of software artifacts. However, noise due to word usage in software artifacts might negatively affect the recovery accuracy. We propose the use of smoothing filters to reduce the effect of noise in software artifacts and improve the(More)
One of the most successful applications of textual analysis in software engineering is the use of Information Retrieval (IR) methods to reconstruct traceability links between software artifacts. Unfortunately, due to the limitations of both the humans developing artifacts and the IR techniques any IR-based traceability recovery method fails to retrieve some(More)