# Better software analytics via “DUO”: Data mining algorithms using/used-by optimizers

@article{Agrawal2020BetterSA,
title={Better software analytics via “DUO”: Data mining algorithms using/used-by optimizers},
author={Amritanshu Agrawal and Tim Menzies and Leandro L. Minku and Markus Wagner and Zhe Yu},
journal={Empirical Software Engineering},
year={2020},
volume={25},
pages={2099-2136}
}
• Published 4 December 2018
• Computer Science
• Empirical Software Engineering
This paper claims that a new field of empirical software engineering research and practice is emerging: data mining using/used-by optimizers for empirical studies, or DUO . For example, data miners can generate models that are explored by optimizers. Also, optimizers can advise how to best adjust the control parameters of a data miner. This combined approach acts like an agent leaning over the shoulder of an analyst that advises “ask this question next” or “ignore that problem, it is not…

### When, and Why, Simple Methods Fail. Lessons Learned from Hyperparameter Tuning in Software Analytics (and Elsewhere)

• Computer Science
• 2019
The conclusion will be that this special properties of SE data can be exploited to great effect to find better optimizations for SE tasks via a tactic called "dodging" (explained in this paper).

### How to “DODGE” Complex Software Analytics

• Computer Science
IEEE Transactions on Software Engineering
• 2021
By ignoring redundant tunings, ODGE, a tuning tool, runs orders of magnitude faster, while also generating learners with more accurate predictions than seen in prior state-of-the-art approaches.

### An Empirical Study of Model-Agnostic Techniques for Defect Prediction Models

• Computer Science
IEEE Transactions on Software Engineering
• 2022
It is concluded that model-agnostic techniques are needed to explain individual predictions of defect models and that more than half of the practitioners perceive that the contrastive explanations are necessary and useful to understand the predictions of defects.

### Is AI dierent for SE?

• Computer Science
• 2019
A new kind of SE research is needed for developing new AI tools that are more suited to SE problems, as it is shown that standard AI tools work best when the target is relatively more frequent.

### Predicting health indicators for open source projects (using hyperparameter optimization)

• Computer Science
Empirical Software Engineering
• 2022
This is the largest study yet conducted, using recent data for predicting multiple health indicators of open-source projects, and finds that traditional estimation algorithms make many mistakes.

### Is AI different for SE?

• Computer Science
ArXiv
• 2019
Standard AI tools work best when the target is relatively more frequent, and a new kind of SE research is needed for developing new AI tools that are more suited to SE problems.

### A Pragmatic Approach for Hyper-Parameter Tuning in Search-based Test Case Generation

• Computer Science
Empir. Softw. Eng.
• 2021
A new metric is proposed (“Tuning Gain”), which estimates how cost-effective tuning a particular class is, and a tuning approach called Meta-GA is used, which shows that for a low tuning budget, prioritizing classes outperforms the alternatives in terms of extra covered branches.

### What makes a good Node.js package? Investigating Users, Contributors, and Runnability

• Computer Science
ArXiv
• 2021
This study conducts a survey asking Node.js developers to evaluate the importance of 30 features derived from existing work, including GitHub activity, software usability, and properties of the repository and documentation, and finds that predicting the runnability of packages is viable.

### VEER: A Fast and Disagreement-Free Multi-objective Configuration Optimizer

• Computer Science
• 2021
This paper shows that model disagreement can be mitigated via VEER, a one-dimensional approximation to the N-objective space, which is recommended as a very fast method to solve complex configuration problems, while at the same time avoiding model disagreement.

### Predicting Good Configurations for GitHub and Stack Overflow Topic Models

• Computer Science
2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)
• 2019
It is found that popular rules of thumb for topic modelling parameter configuration are not applicable to the corpora used in the experiments, and corpora sampled from GitHub and Stack Overflow have different characteristics and require different configurations to achieve good model fit, and one can predict good configurations for unseen corpora reliably.

## References

SHOWING 1-10 OF 143 REFERENCES

### Data-Driven Search-Based Software Engineering

• Computer Science
2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)
• 2018
It is argued that combining these two fields is useful for situations which require learning from a large data source or when optimizers need to know the lay of the land to find better solutions, faster.

### Perspectives on Data Science for Software Engineering

• Computer Science
Perspectives on Data Science for Software Engineering
• 2016

### Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

• Computer Science
Automated Software Engineering
• 2016
Dynamic Cross-company Learning (DCL) is proposed to dynamically identify which WC or CC past models are most useful for making predictions to a given company at the present, and automatically emphasizes the predictions given by these models in order to improve predictive performance.

### Building Better Quality Predictors Using "ε-Dominance"

• Computer Science
ArXiv
• 2018
DART, an algorithm especially selected to succeed for large ε software quality prediction problems, is explored, which dramatically outperforms three sets of state-of-the-art defect prediction methods.

### On the value of user preferences in search-based software engineering: A case study in software product lines

• Computer Science
2013 35th International Conference on Software Engineering (ICSE)
• 2013
The conclusion is that search-based software engineering methods need to change, particularly when studying complex decision spaces, since methods in widespread use perform much worse than IBEA (Indicator-Based Evolutionary Algorithm).

### Building Better Quality Predictors Using "$\epsilon$-Dominance"

• Computer Science
• 2018
DART, an algorithm especially selected to succeed for large $\epsilon$ software quality prediction problems, is explored, which dramatically out-performs three sets of state-of-the-art defect prediction methods.