Corpus ID: 218581279

fplyr: the split-apply-combine strategy for big data in R

@article{Marotta2020fplyrTS,
  title={fplyr: the split-apply-combine strategy for big data in R},
  author={F. Marotta},
  journal={arXiv: Computation},
  year={2020}
}
  • F. Marotta
  • Published 2020
  • Computer Science, Mathematics
  • arXiv: Computation
We present fplyr, a new package for the R language to deal with big files. It allows users to easily implement the split-apply-combine strategy for files that are too big to fit into the available memory, without relying on data bases nor introducing non-native R classes. A custom function can be applied independently to each group of observations, and the results may be either returned or directly printed to one or more output files. 

Figures from this paper

References

SHOWING 1-10 OF 14 REFERENCES
The Split-Apply-Combine Strategy for Data Analysis
TLDR
This paper gives rise to a new R package that allows you to smoothly apply a split-apply-combine strategy, without having to worry about the type of structure in which your data is stored. Expand
Scalable Strategies for Computing with Massive Data
TLDR
Two complementary statistical computing frameworks that address challenges in parallel processing and the analysis of massive data can be used in combination to address challenges that have effectively been beyond the reach of researchers who lack specialized software development skills or expensive hardware. Expand
Reshaping Data with the reshape Package
TLDR
The reshape package for R is presented, which provides a common framework for many types of data reshaping and aggregation, where the data are ‘melted’ into a form which distinguishes measured and identifying variables, and then cast into a new shape, whether it be a data frame, list, or high dimensional array. Expand
iotools: High-Performance I/O Tools for R
TLDR
The iotools package provides a set of tools for Input/Output (I/O) intensive datasets processing in R which minimize copying and avoid the use of intermediate string representations whenever possible. Expand
Welcome to the Tidyverse
TLDR
This is a list of winners and nominees for the 2016 Paralympic Games in Rio de Janeiro, Brazil. Expand
A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2020
  • URL https://www.R-project.org/
  • 2020
sqldf: Manipulate R Data Frames Using SQL
  • 2017
sqldf: Manipulate R Data Frames Using SQL, 2017
  • URL https://CRAN.R-project. org/package=sqldf. R package version 0.4-11
  • 2017
Tidy data
  • Journal of Statistical Software
  • 2014
ff: Memory-efficient Storage of Large Data on Disk and Fast Access Functions
  • 2014
...
1
2
...