Predicting health indicators for open source projects (using hyperparameter optimization)

@article{Xia2020PredictingHI,
  title={Predicting health indicators for open source projects (using hyperparameter optimization)},
  author={Tianpei Xia and Wei Fu and Rui Shu and Rishabh Agrawal and Tim Menzies},
  journal={Empirical Software Engineering},
  year={2020},
  volume={27}
}
Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstrate this, we use 64,181 months of data from 1,159 GitHub projects to make various predictions about the recent status of those projects (as of April… 

Optimizing Predictions for Very Small Data Sets: a case study on Open-Source Project Health Prediction

Landscape analytics method SNEAK is presented, both faster and and more effective than prior state-of-the-art hyperparameter optimization algorithms (FLASH, HYPEROPT, OPTUNA, and differential evolution), and might be useful in other “data-light” SE domains.

Approach to Formalizing Software Projects for Solving Design Automation and Project Management Tasks

This paper described the knowledge base model and diagnostic analytics method for the solving of design automation and project management tasks and presents examples of use cases for applying the proposed approach.

References

SHOWING 1-10 OF 80 REFERENCES

Monitoring the "health" status of open source web-engineering projects

A concept of “health” indicators and an evaluation process that can help to get a status overview of OSS projects in a timely fashion and predict project survivability based on the project data available on web repositories are proposed.

Automated Parameter Optimization of Classification Techniques for Defect Prediction Models

This paper concludes that parameter settings can indeed have a large impact on the performance of defect prediction models, suggesting that researchers should experiment with the parameters of the classification techniques.

Is "Better Data" Better Than "Better Data Miners"?

For software analytic tasks like defect prediction, data pre-processing can be more important than classifier choice, ranking studies are incomplete without such pre- Processing, and SMOTUNED is a promising candidate for pre- processing.

The Impact of Automated Parameter Optimization on Defect Prediction Models

It is found that traditionally overlooked techniques like C5.0 and neural networks can actually outperform widely-used techniques after optimization is applied, highlighting the importance of exploring the parameter space when using parameter-sensitive classification techniques.

Predicting the number of forks for open source software project

This paper uses stepwise regression and design a model to predict the number of forks for open source software projects on GitHub, which has high prediction accuracy and allows users to set the combination of time parameters and satisfy their own needs.

A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches

A benchmark for CPDP is provided and it is determined that an approach proposed by Camargo Cruz and Ochimizu (2009) based on data standardization performs best and is always ranked among the statistically significant best results for all metrics and data sets.

A Bayesian Based Method for Agile Software Development Release Planning and Project Health Monitoring

A quantitative model for project health evaluation is presented that helps decision makers make the right decision early to amend any discrepancy that may hinder on time and high quality software delivery.

Sequential Model Optimization for Software Effort Estimation

This paper applies a configuration technique called “ROME” (Rapid Optimizing Methods for Estimation), which uses sequential model-based optimization (SMO) to find what configuration settings of effort estimation techniques work best for a particular data set.
...