Corpus ID: 15505613

Performance comparison between Java and JNI for optimal implementation of computational micro-kernels

  title={Performance comparison between Java and JNI for optimal implementation of computational micro-kernels},
  author={Nassim A. Halli and Henri-Pierre Charles and Jean-François M{\'e}haut},
General purpose CPUs used in high performance computing (HPC) support a vector instruction set and an out-of-order engine dedicated to increase the instruction level parallelism. Hence, related optimizations are currently critical to improve the performance of applications requiring numerical computation. Moreover, the use of a Java run-time environment such as the HotSpot Java Virtual Machine (JVM) in high performance computing is a promising alternative. It benefits from its programming… Expand
SparkJNI: A Toolchain for Hardware Accelerated Big Data Apache Spark
  • T. Voicu, Z. Al-Ars
  • Computer Science
  • 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA)
  • 2019
This paper analyzes the state-of-the-art developments in the field of heterogeneously accelerated Spark, and proposes SparkJNI, a framework for JNI accelerated Spark that enables accelerated execution through native code integration by automatically generating C++ code wrappers for easy code development by the programmer. Expand
Explicit SIMD instructions into JVM using LMS
This work proposes a systematic approach to automatically generate the support for SIMD instructions by given specification, and addresses the modular aspect of LMS, implementing each instruction as part of an ISA-specific extensible DSL. Expand
Dynamic speculative optimizations for SQL compilation in Apache Spark
This paper presents a new approach to query compilation that overcomes limitations by relying on run-time profiling and dynamic code generation, leading to speedups of up to 4.4x on the TPC-H benchmark with textual-form data formats such as CSV or JSON. Expand
Towards dynamic SQL compilation in Apache Spark
Apache Spark's code generation suffers from significant runtime overheads related to data de-serialization during query execution, which can be significant, especially when applications operate on human-readable data formats such as CSV or JSON. Expand
Dynamic Configuration of a Relocatable Driver and Code Generator for Continuous Deep Analytics
Modern stream processing engines usually use the Java virtual machine (JVM) as execution platform. The JVM increases portability and safety of applications at the cost of not fully utilising the peExpand
Desbordante: a Framework for Exploring Limits of Dependency Discovery Algorithms
Desbordante is presented - a platform that is intended to make the most of the available computational resources and thus to be more suitable for industrial use and pose a number of research questions related to the obtained performance and justify its necessity. Expand
ART vs. NDK vs. GPU acceleration: A study of performance of image processing algorithms on Android
The Android ecosystem contains three major platforms for execution suitable for different purposes, and Android applications normally written in the Java programming language, but computationally computationally more computationally intensive applications are developed. Expand
Optimisation de code pour application Java haute-performance. (Code optimization for high-performance Java application)
Java est a ce jour l'un des langages, si ce n'est le langage, le plus utilise toutes categories de programmation confondues et sa popularite concernant le developpement d'applications scientifiquesExpand


Java for high performance computing: assessment of current research and practice
This paper analyzes the current state of Java for HPC, both for shared and distributed memory programming, presents related research projects, and evaluates the performance of current Java HPC solutions and research developments on a multi-core cluster with a high-speed network, InfiniBand, and a 24-core shared memory machine. Expand
Vapor SIMD: Auto-vectorize once, run everywhere
This work presents a synergistic auto-vectorizing compilation scheme that leverages the optimized intermediate results provided by the first stage across disparate SIMD architectures from different vendors, having distinct characteristics ranging from different vector sizes, memory alignment and access constraints, to special computational idioms. Expand
Java programming for high-performance numerical computing
Programming techniques that lead to Java numerical codes with performance comparable to FORTRAN or C, the more traditional languages for this field are discussed. Expand
Performance potential of optimization phase selection during dynamic JIT compilation
This work determines that program-wide and method-specific phase selection in the HotSpot JIT compiler can produce ideal steady-state performance gains, and finds that existing state-of-the-art heuristic solutions are unable to realize these performance gains; and develops a robust open-source production-quality framework to further explore this problem. Expand
Efficient Cooperation between Java and Native Codes – JNI Performance Benchmark
JNI performance benchmarks for several popular Java Virtual Machine implementations are presented may be useful in avoiding certain JNI pitfalls and provide a better understanding of JNI-related performance issues. Expand
Exploiting superword level parallelism with multimedia instruction sets
This paper has developed a simple and robust compiler for detecting SLPP that targets basic blocks rather than loop nests, and is able to exploit parallelism both across loop iterations and within basic blocks. Expand
Returning Control to the Programmer
Server and workstation hardware architecture is continually improving, yet interpreted languages have failed to keep pace with the proper utilization of modern processors, and the performance disparity will grow exponentially as long as the available SIMD units remain underutilized in interpreted-language environments. Expand
Inlining java native calls at runtime
This work leverages the ability to store statically-generated IL alongside native binaries, to facilitate native inlining at Java callsites at JIT compilation time and shows speedups of up to 93X when inlining and callback transformation are combined. Expand
An efficient native function interface for Java
GNFI is introduced, which is faster than JNI in all relevant cases and more flexible because it avoids the JNI boiler-plate code, and enables the user to directly invoke native code from Java applications. Expand
The Java HotSpotTM Server Compiler
The Java HotSpot TM Server Compiler achieves improved asymptotic performance through a combination of ob− ject−oriented and classical−compiler optimizations. Aggressive inlining using class−hierarchyExpand