Corpus ID: 19035158

Tupleware: Distributed Machine Learning on Small Clusters

  title={Tupleware: Distributed Machine Learning on Small Clusters},
  author={Andrew Crotty and Alex Galakatos and T. Kraska},
  journal={IEEE Data Eng. Bull.},
  • Andrew Crotty, Alex Galakatos, T. Kraska
  • Published 2014
  • Computer Science
  • IEEE Data Eng. Bull.
  • There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the challenges of the Googles and Facebooks of the world— petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to several terabytes in size, and perform primarily compute… CONTINUE READING
    17 Citations

    Figures, Tables, and Topics from this paper

    Resource Elasticity for Large-Scale Machine Learning
    • 41
    • PDF
    HyPerInsight: Data Exploration Deep Inside HyPer
    • 6
    • PDF
    KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics
    • 100
    • PDF
    SampleClean: Fast and Reliable Analytics on Dirty Data
    • 30
    • PDF
    End-to-End Large Scale Machine Learning with KeystoneML
    • 6
    • PDF
    RLEX: A DBMS For Reinforcement Learning
    • 2016
    RLEX: Saftey and Data Quality in Reinforcement Learning-based and Adaptive Systems
    • PDF
    ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning
    • 26
    • PDF
    Two Decades of AI4NETS - AI/ML for Data Networks: Challenges & Research Directions
    • P. Casas
    • Computer Science
    • NOMS 2020 - 2020 IEEE/IFIP Network Operations and Management Symposium
    • 2020
    • PDF


    HaLoop: Efficient Iterative Data Processing on Large Clusters
    • 848
    • Highly Influential
    • PDF
    Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads
    • 490
    • PDF
    Big data analytics with small footprint: squaring the cloud
    • 64
    • PDF
    Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
    • 3,306
    • PDF
    MapReduce: simplified data processing on large clusters
    • 21,219
    SystemML: Declarative machine learning on MapReduce
    • 290
    • PDF
    SCOPE: easy and efficient parallel processing of massive data sets
    • 802
    • PDF
    Nobody ever got fired for using Hadoop on a cluster
    • 80
    • PDF
    MapReduce: A major step backwards
    • 87
    • PDF
    Tenzing a SQL implementation on the MapReduce framework
    • 151
    • PDF