Efficient feature extraction model for validation performance improvement of duplicate bug report detection in software bug triage systems

@article{SoleimaniNeysiani2020EfficientFE,
  title={Efficient feature extraction model for validation performance improvement of duplicate bug report detection in software bug triage systems},
  author={Behzad Soleimani Neysiani and Seyed Morteza Babamir and Masayoshi Aritsugi},
  journal={Inf. Softw. Technol.},
  year={2020},
  volume={126},
  pages={106344}
}
An Empirical Study on Bug Severity Estimation Using Source Code Metrics and Static Analysis
TLDR
A quantitative and qualitative study on two popular datasets, using 10 common source code metrics, and also two popular static analysis tools for analyzing their capability in predicting defects and their severity, shows that code metrics and static analysis methods can be complementary in terms of estimating bug severity.
Improving Recommender Systems Performances Using User Dimension Expansion by Movies’ Genres and Voting-Based Ensemble Machine Learning Technique
TLDR
The proposed approach focuses on modeling categories by averaging rates of movie genres and will be improved by voting machine learning classifiers on multilayer perceptron (MLP) neural networks and k-nearest neighbors (kNN) algorithms.
DEFTri: A Few-Shot Label Fused Contextual Representation Learning For Product Defect Triage in e-Commerce
TLDR
This work proposes a novel framework for automated defect triage (DEFTri) using fine-tuned state-of-the-art pre-trained BERT on labels fused text embeddings to improve contextual representations from human-generated product defects.
Discrete Island-Based Cuckoo Search with Highly Disruptive Polynomial Mutation and Opposition-Based Learning Strategy for Scheduling of Workflow Applications in Cloud Environments
TLDR
The overall experimental and statistical results indicate that DiCSPM provides solutions for the scheduling problem of workflows in cloud computing environment faster than the other compared algorithms.

References

SHOWING 1-10 OF 68 REFERENCES
Improving Performance of Automatic Duplicate Bug Reports Detection using Longest Common Sequence : Introducing New Textual Features for Textual Similarity Detection
TLDR
Experimental results show LCS-based features are important and the accuracy, precision and recall of classifier prediction models improved 4.5,2.5 and 2.5 percent respectively on average after using LCS and get up to 96, 98 and 97 percent respectivelyon average using different classifiers.
Duplicate bug report detection with a combination of information retrieval and topic modeling
TLDR
DBTM is introduced, a duplicate bug report detection approach that takes advantage of both IR-based features and topic- based features, and is able to learn the sets of different terms describing the same technical issues and to detect other not-yet-identified duplicate ones.
Detecting Duplicate Bug Report Using Character N-Gram-Based Features
TLDR
This study investigates the usefulness of low-level features based on characters which have certain inherent advantages over word-based features for the problem of duplicate bug report detection, and concludes that the approach is effective.
DURFEX: A Feature Extraction Technique for Efficient Detection of Duplicate Bug Reports
TLDR
This paper proposes a feature extraction technique that reduces the feature size and yet retains the information that is most critical for the classification of duplicate bug reports, and outperforms the approach that uses distinct function names, while significantly reducing the processing time.
Detecting duplicate bug reports with software engineering domain knowledge
TLDR
Evaluating this software-literature context method on real-world bug reports produces useful results that indicate this semi-automated method has the potential to substantially decrease the manual effort used in contextual bug deduplication while suffering only a minor loss in accuracy.
Automated duplicate detection for bug tracking systems
TLDR
This system uses surface features, textual semantics, and graph clustering to predict duplicate status and is able to reduce development cost by filtering out 8% of duplicate bug reports while allowing at least one report for each real defect to reach developers.
A contextual approach towards more accurate duplicate bug report detection and ranking
TLDR
This work investigates how contextual information about software-quality attributes, software-architecture terms, and system-development topics can be exploited to improve bug deduplication, and concludes that taking into account domain-specific context can improve IR methods for bug dedUplication.
Towards more accurate retrieval of duplicate bug reports
TLDR
A retrieval function (REP) to measure the similarity between two bug reports, which fully utilizes the information available in a bug report including not only the similarity of textual content in summary and description fields, but also similarity of non-textual fields such as product, component, version, etc.
...
...