The DaCapo benchmarks: java benchmarking development and analysis

  title={The DaCapo benchmarks: java benchmarking development and analysis},
  author={Stephen M. Blackburn and Robin Garner and Chris Hoffmann and Asjad M. Khan and Kathryn S. McKinley and Rotem Bentzur and Amer Diwan and Daniel Feinberg and Daniel Frampton and Samuel Z. Guyer and Martin Hirzel and Antony Lloyd Hosking and Maria Jump and Han Bok Lee and J. Eliot B. Moss and Aashish Phansalkar and Darko Stefanovic and Thomas VanDrunen and Daniel von Dincklage and Ben Wiedermann},
  booktitle={Conference on Object-Oriented Programming Systems, Languages, and Applications},
Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection required for Java programs, many evaluations still use methodologies developed for C, C++, and Fortran. SPEC, the dominant purveyor of benchmarks, compounded this problem by institutionalizing these methodologies for their Java benchmark suite. This… 

Figures and Tables from this paper

A scalability benchmark suite for Erlang/OTP

The main aspects of the design and the current status of bencherl, a publicly available scalability benchmark suite for applications written in Erlang, and the scalability dimensions that the suite aims to examine and its infrastructure and current set of benchmarks are presented.

Characterizing a Complex J2EE Workload: A Comprehensive Analysis and Opportunities for Optimizations

  • Yefim ShufI. Steiner
  • Computer Science
    2007 IEEE International Symposium on Performance Analysis of Systems & Software
  • 2007
An analysis of a significantly more complex 3-Tier J2EE (Java 2 Enterprise Edition) commercial workload, SPECjAppServer2004, finds that CPI is strongly correlated with branch mispredictions, translation misses, instruction cache misses, and bursty data cache misses that trigger data prefetching.

Renaissance: benchmarking suite for parallel applications on the JVM

Renaissance, a new benchmark suite composed of modern, real-world, concurrent, and object-oriented workloads that exercise various concurrency primitives of the JVM, is presented and it is shown that the use of concurrencyPrimitives in these workloads reveals optimization opportunities that were not visible with the existing workloads.

A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008

The memory characteristics and the GC behavior of commonly used Java benchmarks, i.e., the DaCapo benchmark suite, theDaCapo Scala benchmark suite and the SPECjvm2008 benchmark suite are described.

Profiling and Tracing Support for Java Applications

The feasibility of undertaking performance evaluations for JVMs using a hybrid JVM/OS tool, such as async-profiler, OS centric profiling and tracing tools based on Linux perf, and the Extended Berkeley Packet Filter Tracing framework is demonstrated.

A Sampling Microarchitecture Simulator for Java Workloads

The dynamic simplescalar (DSS) simulator is enhanced to support contemporary Java benchmark workloads and statistical simulation sampling in the DSS simulator is implemented in order to mitigate simulation time with minimal loss of accuracy.

Wake up and smell the coffee: evaluation methodology for the 21st century

The consequences of the authors' collective inattention to methodology on innovation are explored, recommendations for addressing this problem in one domain are made, and guidelines for other domains are provided.

A Rigorous Benchmarking and Performance Analysis Methodology for Python Workloads

A statistically rigorous benchmarking and performance analysis methodology for Python workloads is proposed, which makes a distinction between start-up and steady-state performance and which summarizes average performance across a set of benchmarks using the harmonic mean speedup.

Automated construction of JavaScript benchmarks

JSBench is described, a flexible tool for workload capture and benchmark generation, and its use in creating eight benchmarks based on popular sites is demonstrated, showing that workloads created by JSBench match the behavior of the original web applications.

Cross-language compiler benchmarking: are we fast yet?

This paper presents 14 benchmarks and a novel methodology to assess the compiler effectiveness across language implementations and argues that these benchmarks help language implementers to identify performance bugs and optimization potential by comparing to other language implementations.



Dynamic metrics for java

A set of unambiguous, dynamic, robust and architecture-independent metrics that can be used to categorize programs according to their dynamic behaviour in five areas: size, data structure, memory use, concurrency, and polymorphism are defined and measured.

Using complete system simulation to characterize SPECjvm98 benchmarks

The Java code is see to limit exploitable parallelism and aggressive instruction issue is seen to be less efficient for SPECjvm98 benchmarks in comparison to SPEC95 programs.

Java Runtime Systems: Characterization and Architectural Implications

The architectural issues explored in this study show that, when Java applications are executed with a JIT compiler, selective translation using good heuristics can improve performance, but the saving is only 10-15 percent at best, and reveals revealing insights and architectural proposals for designing an efficient Java runtime system.

How java programs interact with virtual machines at the microarchitectural level

The goal of this paper is to study this complex interaction between the Java application, its input and the virtual machine it runs on at the microarchitectural level by measuring a large number of performance characteristics using performance counters on an AMD K7 Duron microprocessor.

Memory system behavior of Java programs: methodology and analysis

This paper studies the memory system behavior of Java programs by analyzing memory reference traces of several SPECjvm98 applications running with a Just-In-Time (JIT) compiler and finds that the overall cache miss ratio is increased due to garbage collection, which suffers from higher cache misses compared to the application.

Measuring benchmark similarity using inherent program characteristics

From the study of the similarity between the four generations of SPEC CPU benchmark suites, it is found that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged.

Characterizing the memory behavior of Java workloads: a structured view and opportunities for optimizations

It is found that co-allocation of frequently used method tables can reduce the number of TLB misses and lower the cost of accessing type information block entries in virtual method calls and runtime type checking.

The measured cost of copying garbage collection mechanisms

This study covers both low-level object representation and copying issues as well as the mechanisms needed to support more advanced techniques such as generational collection, large object spaces, and type segregated areas.

Oil and water? High performance garbage collection in Java with MMTk

MMTk is an efficient, composable, extensible, and portable framework for building garbage collectors that uses design patterns and compiler cooperation to combine modularity and efficiency and suggests that performance critical software can embrace modular design and high-level languages.

A Study of the Allocation Behavior of the SPECjvm98 Java Benchmark

An analysis of the memory usage for six of the Java programs in the SPECjvm98 benchmark suite finds that non-pointer data usually represents more than 50% of the allocated space for instance objects, that Java objects tend to live longer than objects in Smalltalk or ML, and that they are fairly small.