Locality-Aware GC Optimisations for Big Data Workloads
@inproceedings{Patrcio2017LocalityAwareGO, title={Locality-Aware GC Optimisations for Big Data Workloads}, author={Duarte Patr{\'i}cio and Rodrigo Bruno and Jos{\'e} Sim{\~a}o and Paulo Ferreira and Lu{\'i}s Veiga}, booktitle={OTM Conferences}, year={2017} }
Many Big Data analytics and IoT scenarios rely on fast and non-relational storage (NoSQL) to help processing massive amounts of data. In addition, managed runtimes (e.g. JVM) are now widely used to support the execution of these NoSQL storage solutions, particularly when dealing with Big Data key-value store-driven applications. The benefits of such runtimes can however be limited by automatic memory management, i.e., Garbage Collection (GC), which does not consider object locality, resulting…
3 Citations
A Performance Comparison of Modern Garbage Collectors for Big Data Environments
- Computer Science
- 2021
This project aims to understand how different garbage collectors scale in terms of throughput, latency, and memory usage in memory-hungry environments, so that, for given a platform with particular performance needs, the most suitable garbage collection algorithm is mapped.
You Can’t Hide You Can’t Run
- Computer Science
- 2020
A new profiling tool, so called PerfUtil, is developed to study, characterize and better understand why benchmarks have sub-optimal performance on NUMA machines, and its effectiveness is based on its ability to track numerous events throughout the system at the managed runtime system level.
You can’t hide you can’t run: a performance assessment of managed applications on a NUMA machine
- Computer ScienceMPLR
- 2020
PerfUtil is a new profiling tool that assists in demystifying NUMA peculiarities and accurately characterize managed applications profiles, and its effectiveness is based on its ability to track numerous events throughout the system at the managed runtime system level.
References
SHOWING 1-10 OF 28 REFERENCES
A bloat-aware design for big data applications
- Computer ScienceISMM '13
- 2013
Experimental results show that this new design paradigm is extremely effective in improving performance --- even for the moderate-size data sets processed, there are 2.5x+ performance gains, and the improvement grows substantially with the size of the data set.
NG2C: pretenuring garbage collection with dynamic generations for HotSpot big data applications
- Computer Science
- 2017
NG2C, a new GC algorithm that combines pretenuring with user-defined dynamic generations, is proposed, which decreases the worst observable GC pause time and avoids object promotion and heap fragmentation both responsible for most of the duration of HotSpot GC pause times.
NumaGiC: a Garbage Collector for Big Data on Big NUMA Machines
- Computer ScienceASPLOS 2015
- 2015
NumaGiC, a GC with a mostly-distributed design that improves overall performance and increases the performance of the collector itself by up to 3.6x over NAPS and up to 5.4x over Parallel Scavenge.
FACADE: A Compiler and Runtime for (Almost) Object-Bounded Big Data Applications
- Computer ScienceASPLOS 2015
- 2015
A novel compiler framework, called Facade, that can generate highly-efficient data manipulation code by automatically transforming the data path of an existing Big Data application by leading to significantly reduced memory management cost and improved scalability.
Benchmarking cloud serving systems with YCSB
- Computer ScienceSoCC '10
- 2010
This work presents the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems, and defines a core set of benchmarks and reports results for four widely used systems.
Taurus: A Holistic Language Runtime System for Coordinating Distributed Managed-Language Applications
- Computer ScienceASPLOS 2016
- 2016
Taurus is a JVM drop-in replacement, requires almost no configuration and can run unmodified off-the-shelf Java applications, and enforces user-defined coordination policies and provides a DSL for writing these policies.
A Checkpointing-enabled and Resource-Aware Java VM for Efficient and Robust e-Science Applications in Grid Environments
- Computer Science
- 2013
This article provides a solution to Java applications with long execution times, by extending a Java VM (Jikes RVM) with mechanisms for checkpointing and migration in ajava VM, to make applications more robust and flexible.
Profile-guided proactive garbage collection for locality optimization
- Computer SciencePLDI '06
- 2006
A new system for continuously improving program data locality at run time with low overhead that proactively reorganizes the heap by leveraging the garbage collector and uses profile information collected through a low-overhead mechanism to guide the reorganization atRun time.
Cassandra: a decentralized structured storage system
- Computer ScienceOPSR
- 2010
Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of…
Ditto - Deterministic Execution Replayability-as-a-Service for Java VM on Multiprocessors
- Computer ScienceMiddleware
- 2013
Ditto is a novel pair of recording and replaying algorithms that employ partial transitive reduction and program-order pruning on-the-fly, and take advantage of TLO static analysis, escape analysis and JVM compiler optimizations to identify thread-local accesses.