Revisiting Dockerfiles in Open Source Software Over Time
@article{Eng2021RevisitingDI, title={Revisiting Dockerfiles in Open Source Software Over Time}, author={Kalvin Eng and Abram Hindle}, journal={2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)}, year={2021}, pages={449-459} }
Docker is becoming ubiquitous with containerization for developing and deploying applications. Previous studies have analyzed Dockerfiles that are used to create container images in order to better understand how to improve Docker tooling. These studies obtain Dockerfiles using either Docker Hub or Github. In this paper, we revisit the findings of previous studies using the largest set of Dockerfiles known to date with over 9.4 million unique Dockerfiles found in the World of Code…
Figures and Tables from this paper
References
SHOWING 1-10 OF 40 REFERENCES
A Large-scale Data Set and an Empirical Study of Docker Images Hosted on Docker Hub
- Computer Science2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)
- 2020
The results demonstrate the maturity of the Docker ecosystem: more reliance on ready-to-use language and application base images as opposed to yet-to be-configured OS images, a downward trend of Docker image sizes demonstrating the adoption of best practices of keeping images small, and a declining trend in the number of smells suggesting a general improvement in quality.
An Empirical Analysis of the Docker Container Ecosystem on GitHub
- Computer Science2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR)
- 2017
An exploratory empirical study on the Docker ecosystem, prevalent quality issues, and the evolution of Dockerfiles finds that most quality issues arise from missing version pinning, and proposes to introduce an abstraction that could deal with the intricacies of different package managers and could improve migration to more light-weight images.
A clustering-based approach for mining dockerfile evolutionary trajectories
- Computer ScienceScience China Information Sciences
- 2018
The potential to implement the best practices through the analysis of the dockerfile evolutionary trajectories motivated this work.
An Insight Into the Impact of Dockerfile Evolutionary Trajectories on Quality and Latency
- Computer Science2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC)
- 2018
An empirical study on a large dataset of 2,840 projects to shed light on the impact of dockerfile evolutionary trajectories on quality and latency in the Docker-based containerization, which derives a number of suggestions for practitioners.
Characterizing the Occurrence of Dockerfile Smells in Open-Source Software: An Empirical Study
- Computer ScienceIEEE Access
- 2020
An empirical study on a large dataset of 6,334 projects to help developers gain some insights into the occurrence of Dockerfile smells, including its coverage, distribution, co-occurrence, and correlation with project characteristics.
Learning from, Understanding, and Supporting DevOps Artifacts for Docker
- Computer Science2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE)
- 2020
A toolset, binnacle, is introduced that enabled us to ingest 900,000 GitHub repositories and learn rules and analyzer that can be used to aid developers in the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues in, and to improve, existing Dockerfiles.
World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data
- Computer Science2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)
- 2019
A very large and frequently updated collection of version control data for FLOSS projects named World of Code (WoC), which is capable of supporting trend evaluation, ecosystem measurement, and the determination of package usage, and is expected to spur investigation into global properties of OSS development leading to increased resiliency of the entire OSS ecosystem.
A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits
- Computer ScienceMSR
- 2020
The approach successfully reduces the size of the megacluster with the largest group of highly interconnected projects containing under 400K repositories, and expects that the resulting map of related projects as well as tools and methods to handle the very large graph will serve as a reference set for mining software projects and other applications.
Curating GitHub for engineered software projects
- Computer ScienceEmpirical Software Engineering
- 2017
This work proposes a framework, and presents a reference implementation of the framework as a tool called reaper, to enable researchers to select GitHub repositories that contain evidence of an engineered software project and identifies software engineering practices (called dimensions) and proposes means for validating their existence in a GitHub repository.
Determining sample size.
- MathematicsJournal of hand therapy : official journal of the American Society of Hand Therapists
- 1995