Predicting health indicators for open source projects (using hyperparameter optimization)
@article{Xia2020PredictingHI, title={Predicting health indicators for open source projects (using hyperparameter optimization)}, author={Tianpei Xia and Wei Fu and Rui Shu and Rishabh Agrawal and Tim Menzies}, journal={Empirical Software Engineering}, year={2020}, volume={27} }
Software developed on public platform is a source of data that can be used to make predictions about those projects. While the individual developing activity may be random and hard to predict, the developing behavior on project level can be predicted with good accuracy when large groups of developers work together on software projects. To demonstrate this, we use 64,181 months of data from 1,159 GitHub projects to make various predictions about the recent status of those projects (as of April…
2 Citations
Optimizing Predictions for Very Small Data Sets: a case study on Open-Source Project Health Prediction
- Computer ScienceArXiv
- 2023
Landscape analytics method SNEAK is presented, both faster and and more effective than prior state-of-the-art hyperparameter optimization algorithms (FLASH, HYPEROPT, OPTUNA, and differential evolution), and might be useful in other “data-light” SE domains.
Approach to Formalizing Software Projects for Solving Design Automation and Project Management Tasks
- Computer Science, EngineeringSoftware
- 2023
This paper described the knowledge base model and diagnostic analytics method for the solving of design automation and project management tasks and presents examples of use cases for applying the proposed approach.
References
SHOWING 1-10 OF 80 REFERENCES
Monitoring the "health" status of open source web-engineering projects
- Computer ScienceInt. J. Web Inf. Syst.
- 2007
A concept of “health” indicators and an evaluation process that can help to get a status overview of OSS projects in a timely fashion and predict project survivability based on the project data available on web repositories are proposed.
Automated Parameter Optimization of Classification Techniques for Defect Prediction Models
- Computer Science2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)
- 2016
This paper concludes that parameter settings can indeed have a large impact on the performance of defect prediction models, suggesting that researchers should experiment with the parameters of the classification techniques.
Is "Better Data" Better Than "Better Data Miners"?
- Computer Science2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)
- 2018
For software analytic tasks like defect prediction, data pre-processing can be more important than classifier choice, ranking studies are incomplete without such pre- Processing, and SMOTUNED is a promising candidate for pre- processing.
The Impact of Automated Parameter Optimization on Defect Prediction Models
- Computer ScienceIEEE Transactions on Software Engineering
- 2019
It is found that traditionally overlooked techniques like C5.0 and neural networks can actually outperform widely-used techniques after optimization is applied, highlighting the importance of exploring the parameter space when using parameter-sensitive classification techniques.
Software effort estimation based on open source projects: Case study of Github
- Computer ScienceInf. Softw. Technol.
- 2017
Predicting the number of forks for open source software project
- Computer ScienceEAST 2014
- 2014
This paper uses stepwise regression and design a model to predict the number of forks for open source software projects on GitHub, which has high prediction accuracy and allows users to set the combination of time parameters and satisfy their own needs.
A Comparative Study to Benchmark Cross-Project Defect Prediction Approaches
- Computer ScienceIEEE Transactions on Software Engineering
- 2018
A benchmark for CPDP is provided and it is determined that an approach proposed by Camargo Cruz and Ochimizu (2009) based on data standardization performs best and is always ranked among the statistically significant best results for all metrics and data sets.
A Bayesian Based Method for Agile Software Development Release Planning and Project Health Monitoring
- Computer Science2010 International Conference on Intelligent Networking and Collaborative Systems
- 2010
A quantitative model for project health evaluation is presented that helps decision makers make the right decision early to amend any discrepancy that may hinder on time and high quality software delivery.
Surgical teams on GitHub: Modeling performance of GitHub project development processes
- Computer ScienceInf. Softw. Technol.
- 2018
Sequential Model Optimization for Software Effort Estimation
- Computer ScienceIEEE Transactions on Software Engineering
- 2022
This paper applies a configuration technique called “ROME” (Rapid Optimizing Methods for Estimation), which uses sequential model-based optimization (SMO) to find what configuration settings of effort estimation techniques work best for a particular data set.