Automated duplicate detection for bug tracking systems

@article{Jalbert2008AutomatedDD,
  title={Automated duplicate detection for bug tracking systems},
  author={Nicholas Jalbert and Westley Weimer},
  journal={2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN)},
  year={2008},
  pages={52-61}
}
  • Nicholas Jalbert, Westley Weimer
  • Published 2008
  • Computer Science
  • 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN)
Bug tracking systems are important tools that guide the maintenance activities of software developers. The utility of these systems is hampered by an excessive number of duplicate bug reports-in some projects as many as a quarter of all reports are duplicates. Developers must manually identify duplicate bug reports, but this identification process is time-consuming and exacerbates the already high cost of software maintenance. We propose a system that automatically classifies duplicate bug… Expand
An HMM-based approach for automatic detection and classification of duplicate bug reports
TLDR
A novel approach that automatically detects duplicate bug reports using stack traces and Hidden Markov Models is proposed and it is shown that HMM and stack traces are a powerful combination for detecting and classifying duplicatebug reports in large bug repositories. Expand
Duplicate Bug Report Detection Using Clustering
TLDR
This paper proposes a new method based on clustering to identify a larger proportion of duplicate bug reports while keeping the false positives of misidentified non-duplicates low and achieves better performance in terms of a harmonic measure that combines true positive and true negative rates when compared to the existing methods. Expand
DupFinder: integrated tool support for duplicate bug report detection
TLDR
A tool named DupFinder is proposed, which implements the state-of-the-art unsupervised duplicate bug report approach by Runeson et al., as a Bugzilla extension and does not require any training data and thus can easily be deployed to any project. Expand
Automated Bug Reporting System in Web Applications
TLDR
This paper designed a framework that automatically generates the bug report and provides the facility of report logging, reusable method and multiple browser support which helps to reduce the human effort and time requires to performing regression testing. Expand
A Bug Rule Based Technique with Feedback for Classifying Bug Reports
  • Tao Zhang, Byungjeong Lee
  • Computer Science
  • 2011 IEEE 11th International Conference on Computer and Information Technology
  • 2011
TLDR
A bug rule based classification technique is proposed to save developers' time in software maintenance by utilizing developer feedback mechanism in the technique and is expected to improve the accuracy of bug reports retrieval. Expand
A Systematic Study of Duplicate Bug Report Detection
TLDR
The researches systematically done in this field are presented by classifying the works into three categories and listing down the methods being used for the classified researches by listing the strengths, limitations, data set, and the major approach used by the popular papers of the research. Expand
Finding Duplicates of Your Yet Unwritten Bug Report
  • Johannes Lerch, M. Mezini
  • Computer Science
  • 2013 17th European Conference on Software Maintenance and Reengineering
  • 2013
TLDR
This work proposes an approach that only uses stack traces and their structure as input to machine-learning algorithms for detecting bug-report duplicates, and shows that this approach performs as good as state-of-the-art techniques, but without requiring the whole text corpus of a bug report to be available. Expand
Detecting bug duplicate reports through local references
TLDR
The performance of Information Retrieval techniques can be significantly improved by guiding the search for duplicates on specific portions of the bug repository, resulting in higher detection rates and constant classification runtime. Expand
Improved Duplicate Bug Report Identification
TLDR
This paper extends Jalbert and Weimer's work by improving the accuracy of automated duplicate bug report identification and experiments with bug reports from Mozilla bug tracking system find that this approach could be improved by about 160%. Expand
Performance of IR Models on Duplicate Bug Report Detection: A Comparative Study
TLDR
This thesis realizes an Online Duplicate Detection Framework that uses a sliding window of a constant time frame as a first step towards simulating incoming bug reports and recommending duplicates to the end user and finds that word based models, in particular a Log-Entropy based weighting scheme, outperform topic based ones such as LSI and LDA. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 22 REFERENCES
Modeling bug report quality
TLDR
A descriptive model of bug report quality based on a statistical analysis of surface features of over 27,000 publicly available bug reports for the Mozilla Firefox project is presented and shows that it can reduce the overall cost of software maintenance in a setting where the average cost of addressing a bug report is more than 2% of the cost of ignoring an important bug report. Expand
How Software Repositories can Help in Resolving a New Change Request
TLDR
The hypothesis is that data stored in software repositories are a good descriptor on how past change requests have been resolved, and can be useful to identify the most appropriate developers to resolve it, or to predict the set of impacted source files. Expand
Who should fix this bug?
TLDR
This paper applies a machine learning algorithm to the open bug repository to learn the kinds of reports each developer resolves and reaches precision levels of 57% and 64% on the Eclipse and Firefox development projects respectively. Expand
How Long Will It Take to Fix This Bug?
TLDR
This work presents an approach that automatically predicts the fixing effort, i.e., the person-hours spent on fixing an issue, using the Lucene framework to search for similar, earlier reports and use their average time as a prediction. Expand
How long did it take to fix bugs?
TLDR
This report compute the bug-fix time of files in ArgoUML and PostgreSQL by identifying when bugs are introduced and when the bugs are fixed by identifying the top 20 bug- fix time files of two projects. Expand
Coping with an open bug repository
TLDR
An initial characterization of two open bug repository from the Eclipse and Firefox projects is provided, the duplicate bug and bug triage problems that arise with these open bug repositories are described, and how machine learning technology is applied to help automate these processes are discussed. Expand
Detection of Duplicate Defect Reports Using Natural Language Processing
TLDR
This work investigates using natural language processing (NLP) techniques to identify duplicates in defect reports at Sony Ericsson mobile communications, and shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Expand
Automatic bug triage using text categorization
TLDR
This paper proposes to apply machine learning techniques to assist in bug triage by using text categorization to predict the developer that should work on the bug based on thebug’s description. Expand
An Overview of the Software Engineering Process and Tools in the Mozilla Project
TLDR
The software engineering aspect of a large Open Source project is described and the software engineering tools used in the Mozilla Project are covered, since the Mozilla process and tools are intimately related. Expand
Speeding up requirements management in a product software company: linking customer wishes to product requirements through linguistic engineering
TLDR
This work presents a pragmatic linguistic engineering approach to how statistical natural language processing may be used to support the manual linkage between customer wishes and product requirements by suggesting potential links. Expand
...
1
2
3
...