Corpus ID: 52988622

Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data

@inproceedings{Essertel2018FlareOA,
  title={Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data},
  author={Gr{\'e}gory M. Essertel and Ruby Y. Tahboub and James M. Decker and K. Brown and K. Olukotun and Tiark Rompf},
  booktitle={OSDI},
  year={2018}
}
  • Grégory M. Essertel, Ruby Y. Tahboub, +3 authors Tiark Rompf
  • Published in OSDI 2018
  • Computer Science
  • In recent years, Apache Spark has become the de facto standard for big data processing. Spark has enabled a wide audience of users to process petabyte-scale workloads due to its flexibility and ease of use: users are able to mix SQL-style relational queries with Scala or Python code, and have the resultant programs distributed across an entire cluster, all without having to work with low-level parallelization or network primitives. However, many workloads of practical importance are not… CONTINUE READING
    26 Citations
    Dynamic speculative optimizations for SQL compilation in Apache Spark
    • 2
    • PDF
    Gerenuk: thin computation over big native data using speculative program transformation
    • 4
    • Highly Influenced
    • PDF
    Analyzing and Optimizing Java Code Generation for Apache Spark Query Plan
    • 1
    Flare & Lantern: Efficiently Swapping Horses Midstream
    • 2
    • PDF
    Polystore++: Accelerated Polystore System for Heterogeneous Workloads
    • 2
    • PDF
    Architecting a Query Compiler for Spatial Workloads
    • 3
    Grizzly: Efficient Stream Processing Through Adaptive Query Compilation
    • 7
    • PDF

    References

    SHOWING 1-10 OF 53 REFERENCES
    Spark: Cluster Computing with Working Sets
    • 4,358
    • PDF
    HaLoop: Efficient Iterative Data Processing on Large Clusters
    • 847
    • PDF
    Shark: SQL and rich analytics at scale
    • 434
    • PDF
    SnappyData: A Hybrid Transactional Analytical Store Built On Spark
    • 33
    • PDF
    DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
    • 813
    • Highly Influential
    • PDF
    MapReduce: Simplified Data Processing on Large Clusters
    • 21,203
    MonetDB/X100: Hyper-Pipelining Query Execution
    • 534
    • PDF
    Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns
    • K. Brown, H. Lee, +4 authors K. Olukotun
    • Computer Science
    • 2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
    • 2016
    • 44
    • PDF
    Steno: automatic optimization of declarative queries
    • 59
    • PDF
    Weld : A Common Runtime for High Performance Data Analytics
    • 80
    • PDF