Riposte: A trace-driven compiler and parallel VM for vector code in R

@article{Talbot2012RiposteAT,
  title={Riposte: A trace-driven compiler and parallel VM for vector code in R},
  author={Justin Talbot and Zach DeVito and Pat Hanrahan},
  journal={2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)},
  year={2012},
  pages={43-51}
}
  • Justin TalbotZach DeVitoP. Hanrahan
  • Published 19 September 2012
  • Computer Science
  • 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT)
There is a growing utilization gap between modern hardware and modern programming languages for data analysis. Due to power and other constraints, recent processor design has sought improved performance through increased SIMD and multi-core parallelism. At the same time, high-level, dynamically typed languages for data analysis have become popular. These languages emphasize ease of use and high productivity, but have, in general, low performance and limited support for exploiting hardware… 

Figures from this paper

Accelerating Dynamically-Typed Languages on Heterogeneous Platforms Using Guards Optimization

The present paper presents MegaGuards, a new approach for speculatively executing dynamic languages on heterogeneous platforms in a fully automatic and transparent manner, which removes guards from compute-intensive loops and improves sequential performance.

Just-In-Time GPU Compilation for Interpreted Languages with Partial Evaluation

This paper uses just-in-time compilation to transparently and automatically offload computations from interpreted dynamic languages to heterogeneous devices and shows that when taking into account start-up time, large speedups are achievable, even when the applications run for as little as a few seconds.

Parallelizing Julia with a Non-Invasive DSL

ParallelAccelerator is presented, a library and compiler for high-level, high-performance scientific computing in Julia that exposes the implicit parallelism in high- level array-style programs and compiles them to fast, parallel native code.

Optimizing R VM: Allocation Removal and Path Length Reduction via Interpreter-level Specialization

A first classification of R programming styles into Type I (looping over data), Type II (vector programming), and Type III (glue codes) is introduced, which shows the most serious overhead of R are mostly manifested on Type I R codes, whereas many Type III R codes can be quite fast.

Just-in-time Length Specialization of Dynamic Vector Code

A trace-based just-in-time compilation strategy that performs partial length specialization of dynamically typed vector code to avoid excessive compilation overhead while still enabling the generation of efficient machine code through length-based optimizations.

Dynamic page sharing optimization for the R language

This work presents a low-overhead page sharing approach for R that significantly reduces the interpreter's memory overhead and Concentrating on the most rewarding optimizations avoids the high runtime overhead of existing generic approaches for memory deduplication or compression.

ROSA: R Optimizations with Static Analysis

ROSA is presented, a static analysis framework to improve the performance and space efficiency of R programs and shows substantial reductions by ROSA in execution time and memory consumption over both CRAN R and Microsoft R Open.

Run-time data analysis to drive compiler optimizations

  • Sebastian Kloibhofer
  • Computer Science
    Companion Proceedings of the 2021 ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications: Software for Humanity
  • 2021
This project proposes integrating data analysis into a dynamic runtime to speed up big data applications and uses the detailed run-time information for speculative compiler optimizations based on the shape and complexion of the data to improve performance.

Contextual dispatch for function specialization

This paper proposes an approach to further the specialization of dynamic language compilers, by disentangling classes of behaviors into separate optimization units, and describes a compiler for the R language which uses this approach.

Optimizing R language execution via aggressive speculation

Novel optimizations backed up by aggressive speculation techniques and implemented within FastR, an alternative R language implementation, utilizing Truffle -- a JVM-based language development framework developed at Oracle Labs are described.

References

SHOWING 1-10 OF 39 REFERENCES

Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

This paper introduces Intel® Array Building Blocks (ArBB), which is a retargetable dynamic compilation framework that focuses on making it easier to write and port programs so that they can harvest data and thread parallelism on both multi-core and heterogeneous many-core architectures, while staying within standard C++.

ispc: A SPMD compiler for high-performance CPU programming

  • M. PharrW. Mark
  • Computer Science
    2012 Innovative Parallel Computing (InPar)
  • 2012
A compiler, the Intel R® SPMD Program Compiler (ispc), is developed that delivers very high performance on CPUs thanks to effective use of both multiple processor cores and SIMD vector units.

Compiling for stream processing

A compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage, and is able to overlap memory operations and manage local storage so that 78% to 96% of program execution time is spent in running computational kernels.

Copperhead: compiling an embedded data parallel language

The language, compiler, and runtime features that enable Copperhead to efficiently execute data parallel code are discussed and the program analysis techniques necessary for compiling Copperhead code into efficient low-level implementations are introduced.

HotpathVM: an effective JIT compiler for resource-constrained devices

A just-in-time compiler for a Java VM that is small enough to fit on resource-constrained devices, yet is surprisingly effective, and benchmarks show a speedup that in some cases rivals heavy-weight just- in-time compilers.

Harnessing the Multicores: Nested Data Parallelism in Haskell

This talk will describe Data Parallel Haskell, which embodies nested data parallelism in a modern, general-purpose language, implemented in a state-of-the-art compiler, GHC, and will focus particularly on the vectorisation transformation, which transforms nested to flatData Parallel Haskell.

Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Lazy Binary Splitting is presented, a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager binary Splitting work-stealing, but improves performance and ease-of-programming.

Scalable aggregation on multicore processors

This paper aims to provide a solution to performing in-memory parallel aggregation on the Intel Nehalem architecture, and considers several previously proposed techniques, including a hybrid independent/shared method and a method that clones data items automatically when contention is detected.

McFLAT: A Profile-Based Framework for MATLAB Loop Analysis and Transformations

A new framework, MCFLAT, is presented, which uses profile-based training runs to determine likely loop-bounds ranges for which specialized versions of the loops may be generated and which ranges are worth specializing using a variety of loop transformations.

Trace-based just-in-time type specialization for dynamic languages

This work presents an alternative compilation technique for dynamically-typed languages that identifies frequently executed loop traces at run-time and then generates machine code on the fly that is specialized for the actual dynamic types occurring on each path through the loop.