On the Development and Distribution of R Packages: An Empirical Analysis of the R Ecosystem

  title={On the Development and Distribution of R Packages: An Empirical Analysis of the R Ecosystem},
  author={Alexandre Decan and Tom Mens and Ma{\"e}lick Claes and Philippe Grosjean},
  journal={Proceedings of the 2015 European Conference on Software Architecture Workshops},
  • Alexandre Decan, T. Mens, P. Grosjean
  • Published 7 September 2015
  • Computer Science
  • Proceedings of the 2015 European Conference on Software Architecture Workshops
This paper explores the ecosystem of software packages for R, one of the most popular environments for statistical computing today. [] Key Result With this analysis, we provide a deeper insight into the extent and the evolution of the R package ecosystem.

Figures and Tables from this paper

Evolution of the R software ecosystem: Metrics, relationships, and their impact on qualities

An Empirical Analysis of the R Package Ecosystem

The data, methods, and calculations herein provide an anchor for public discourse and industry decisions related to R and CRAN, serving as a foundation for future research on the R software ecosystem and "data science" more broadly.

Evolution and prospects of the Comprehensive R Archive Network (CRAN) package ecosystem

An empirical analysis of the evolution of the CRAN repository in the last 20 years is provided, considering the laws of software evolution and the effect of CRAN's policies on such development and how there seems to be a relevant increase in complexity in recent years.

An empirical exploration of the vibrant R ecosystem

It was discovered that while initiated by statistics, the development of R benefited greatly from software developers and users coming from various disciplines such as agricultural, biological, environmental, and medical science.

When GitHub Meets CRAN: An Analysis of Inter-Repository Package Dependency Problems

This work explores how the use of GitHub influences the R ecosystem, both for the distribution of R packages and for inter-repository package dependency management.

An empirical comparison of dependency network evolution in seven software packaging ecosystems

It is observed that the dependency networks tend to grow over time, both in size and in number of package updates, while a minority of packages are responsible for most of the package updates.

Evolution of a haskell repository and its use of monads: an exploratory study of stackage

This paper presents an empirical study that covers the evolution of fourteen Long-Term Support releases (period 2014 - 2020) of available packages, including the use of monads from the mtl package that provides the standard monad core (e.g., state, reader, continuations).

An empirical comparison of dependency issues in OSS packaging ecosystems

An empirical analysis of how the dependency graphs of three large packaging ecosystems (npm, CRAN and RubyGems) evolve over time is presented, studying how the existing package dependencies impact the resilience of the three ecosystems over time.

Ten simple rules for finding and selecting R packages

R is an increasingly preferred software environment for data analytics and statistical computing among scientists and practitioners. Packages markedly extend R’s utility and ameliorate inefficient



The Evolution of the R Software Ecosystem

The evolution characteristics of the statistical computing project GNU R are explored, finding that the ecosystem of user-contributed R packages has been growing steadily since R's conception, at a significantly faster rate than core packages, yet each individual package remains stable in size.

Are There Too Many R Packages

It is argued that the statistical computing community needs a more common understanding of software quality, and better domain-specific semantic resources.

Possible Directions for Improving Dependency Versioning in R

This paper explores the general lack of dependency versioning in the infrastructure of R in greater detail, and suggests approaches taken by other open source communities that might work for R as well.

The Popularity of Data Analysis Software

Various ways of measuring the popularity or market share of BMDP, JMP, Minitab, R, R-PLUS, Revolution R, S-PLus, SAS, SPSS, Stata, Statistica, and Systat are presented, as well as two implementations of the SAS Lanugage, Carolina and WPS.

maintaineR: A Web-Based Dashboard for Maintainers of CRAN Packages

The R development community maintains thousands of packages through its Comprehensive R Archive Network CRAN. The growth and evolution of this archive makes it more and more difficult to maintain

Structural Complexity and Decay in FLOSS Systems: An Inter-repository Study

Investigating whether the structure of a FLOSS system and its decay can also be influenced by the repository in which it is retained shows that the repository hosting larger and more active projects presents more complex structures.

GHTorrent: Github's data from a firehose

GHTorrent aims to create a scalable off line mirror of GitHub's event streams and persistent data, and offer it to the research community as a service.

Continuous Integration in a Social-Coding World: Empirical Evidence from GitHub

This paper explores how GitHub developers use continuous integration as well as whether the contribution type (direct versus indirect) and different project characteristics (e.g., main programming language, or project age) are associated with the success of the automatic builds.

The promises and perils of mining git

This work focuses on git, a very popular DSCM used in high-profile projects and aims to help researchers interested in DSCMs avoid perils when mining and analyzing git data.