Virtual machine showdown: stack versus registers

@inproceedings{Shi2005VirtualMS,
  title={Virtual machine showdown: stack versus registers},
  author={Yunhe Shi and David Gregg and Andrew Beatty and M. Anton Ertl},
  booktitle={VEE '05},
  year={2005}
}
Virtual machines (VMs) are commonly used to distribute programs in an architecture-neutral format, which can easily be interpreted or compiled. A long-running question in the design of VMs is whether stack architecture or register architecture can be implemented more efficiently with an interpreter. We extend existing work on comparing virtual stack and virtual register architectures in two ways. Firstly, our translation from stack to register code is much more sophisticated. The result is that… 
Swift: a register-based JIT compiler for embedded JVMs
TLDR
This paper presents a fast and effective JIT technique for mobile devices, building on a register-based Java bytecode format which is more similar to the underlying machine architecture and proposes Swift, a novel JIT compiler on register- based bytecode, which generates native code for RISC machines.
A Performance Survey on Stack-based and Register-based Virtual Machines
TLDR
This paper presents two lightweight, custom-designed, Turing-equivalent virtual machines that are specifically designed in benchmarking virtual machine performance - the "Conceptum" stack-based virtual machine, and the "Inertia" register-basedvirtual machine.
Optimizing software-hardware interplay in efficient virtual machines
TLDR
This thesis work provides designs for low maintenance, high efficiency VMs, and demonstrates the large performance improvement potentially enabled by tailoring language implementation to modern hardware by optimizing the software-hardware interplay.
Ahead-of-Time Compilation of Stack-Based JVM Bytecode on Resource-Constrained Devices
TLDR
This paper identifies three distinct sources of overhead, two of which are related to the JVM’s stack-based architecture, and proposes a set of optimisations to target each of them, and reduces code size overhead by 59%.
One VM to rule them all
TLDR
This work describes a new approach to virtual machine (VM) construction that amortizes much of the effort in initial construction by allowing new languages to be implemented with modest additional effort, and suggests that high performance is attainable while preserving a modular and layered architecture.
Improved Ahead-of-time Compilation of Stack-based JVM Bytecode on Resource-constrained Devices
TLDR
This article identifies the major sources of overhead resulting from this basic approach and presents optimisations to remove most of the remaining performance overhead, and over half the size overhead, reducing them to 67% and 77%, respectively.
The Design and Implementation of a Bytecode for Optimization on Heterogeneous Systems
TLDR
The approach used here is to combine elements of the Dalvik R © virtual machine with concepts from the OpenCL R © heterogeneous computing platform along with an annotation system so that the results of complex compile time analysis can be available to the Just-In-Time compiler.
Bridging The Gap Between Machine And Language Using First-Class Building Blocks
TLDR
It is argued that the best way to open the VM is to eliminate it, and Pinocchio, a natively compiled Smalltalk, is presented, in which it is identified and reify three basic building blocks for object-oriented languages.
A High Performance Java Card Virtual Machine Interpreter Based on an Application Specific Instruction-Set Processor
TLDR
This paper presents a hardware/software co-design solution for the performance improvement of the bytecode interpreter, adopting a pseudo-threaded code interpreter that allows a better run-time performance with a small amount of additional code.
Virtual Machine and Bytecode for Optimization on Heterogeneous Systems
TLDR
The approach used here is to combine elements of the Dalvik virtual machine with concepts from the Open CL heterogeneous computing platform, along with an annotation system so that the results of complex compile time analysis can be available to the Just-In-Time compiler.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 136 REFERENCES
The case for virtual register machines
TLDR
This paper presents a working system for translating stack-based Java virtual machine (JVM) code to a simple register code, and believes that the high cost of dispatches makes register machines attractive even at the cost of increased loads.
Catenation and specialization for Tcl virtual machine performance
TLDR
In the context of the Tcl VM, bytecodes are converted to native Sparc code, by concatenating the native instructions used by the VM to implement each bytecode instruction, and the dispatch loop is eliminated.
Vmgen—a generator of efficient virtual machine interpreters
TLDR
An interpreter generator that takes simple virtual machine instruction descriptions as input and generates C code for processing the instructions in several ways: execution, virtual machine code generation, disassembly, tracing, and profiling is presented.
Stack caching for interpreters
  • M. Ertl
  • Computer Science
    PLDI '95
  • 1995
TLDR
This paper explores two methods to reduce this overhead for virtual stack machines by caching top-of-stack values in (real machine) registers by using a dynamic or a static method.
The Behavior of Efficient Virtual Machine Interpreters on Modern Architectures
TLDR
The results show that for current branch predictors, threaded code interpreters cause fewer mispredictions, and are almost twice as fast as switch based interpreters on modern superscalar architectures.
CACAO - A 64-bit JavaVM Just-in-Time Compiler
TLDR
The CACAO system translates Java byte code on demand into native code for the ALPHA processor, a just in time compiler for Java that executes Java programs up to 85 times faster than the JDK interpreter, up to 7 times fasterthan the kaae JIT compiler.
Context threading: a flexible and efficient dispatch technique for virtual machine interpreters
TLDR
The dispatch technique, context threading, improves branch prediction and performance by aligning hardware and virtual machine state by converting virtual branching instructions to native branches, mobilizing the hardware's branch prediction resources.
Towards Superinstructions for Java Interpreters
TLDR
This paper describes work in progress on the design and implementation of a system of superinstructions for an efficient Java interpreter for connected devices and embedded systems, and describes the basic interpreter, the interpreter generator that is used to automatically create optimised source code for superInstructions, and discusses Java specific issues relating to superinSTRUCTions.
Code sharing among states for stack-caching interpreter
TLDR
This paper presents a code sharing mechanism that achieves performance as efficient as the stack-caching interpreter and in the meantime keeps the code size as compact as general threaded interpreters.
Measuring Limits of Fine-grained Parallelism
TLDR
It is shown that a traditional depthrst copying garbage collector would better reduce run time on a lowlevel parallel machine with speculative execution if it were to follow CDR links before CAR links.
...
1
2
3
4
5
...