Low overhead dynamic binary translation on ARM

@inproceedings{DAntras2017LowOD,
  title={Low overhead dynamic binary translation on ARM},
  author={Amanieu D'Antras and Cosmin Gorgovan and Jim D. Garside and Mikel Luj{\'a}n},
  booktitle={PLDI},
  year={2017}
}
The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity. We present MAMBO-X64, a dynamic binary translator for Linux which executes 32-bit ARM binaries using only the AArch64 instruction set. We have… 

Figures and Tables from this paper

Exploiting Vector Processing in Dynamic Binary Translation
TLDR
This paper study the fundamental issues involved in cross-ISA Dynamic Binary Translation (DBT) to convert non-vectorized loops to vector/SIMD forms to achieve greater computation throughput available in newer processor architectures.
Work-in-Progress: Exploiting SIMD Capability in an ARMv7-to-ARMv8 Dynamic Binary Translator
TLDR
This paper presents a software based solution for the backward compatibility via Dynamic Binary Translation (DBT) that is able to run ARMv7 executables on pure ARMv8 devices and achieves an average speedup of 1.49 × compared toARMv7 native run across various benchmarks.
Improving Startup Performance in Dynamic Binary Translators
TLDR
This work analyzes the extent and causes for a DBT system's startup performance latency, and proposes and assess the potential of a new technique that parallelizes program translations on multi-core machines to reduce its evident run-time costs.
How to Test, Analyze, and Reduce Memory Interference Delay in Modern COTS Multicore Systems?
TLDR
This paper proposes a software-based testing approach for analyzing memory interference delay, when cores are exposed to extensive read/write requests that access in parallel their Cache Coherent Interconnect.
Unleashing the Power of Learning: An Enhanced Learning-Based Approach for Dynamic Binary Translation
TLDR
An enhanced learning-based approach that relaxes such equivalence requirements but supplements them with constraining conditions to make them semantically equivalent when such rules are applied and can improve the dynamic coverage of the translation.
Wakeup CPU User Task Sleep Existing Device Resume Thaw user CPU Peripheral Core Suspend Resume Commodity Kernel DRAM Emu Dynamic Binary Translation IO Translated code Freeze user Device
TLDR
This work presents a new OS structure, in which a lightweight virtual executor called transkernel offloads specific phases from a monolithic kernel, and shows that while crossISA DBT is typically used under the assumption of efficiency loss, it can enable efficiency gain, even on off-the-shelf hardware.

References

SHOWING 1-10 OF 22 REFERENCES
HyperMAMBO-X64: Using Virtualization to Support High-Performance Transparent Binary Translation
TLDR
A key factor in the low overhead of HyperMAMBO-X64 is its deep integration with the virtualization and memory management features of ARMv8, which are exploited to support cached translations across multiple address spaces while ensuring that translated code remains consistent with the source instructions it is based on.
Optimizing Indirect Branches in Dynamic Binary Translators
TLDR
MAMBO-X64, a dynamic binary translator that translates 32-bit ARM (AArch32) code to 64-bit CPU2006 code, uses three novel techniques to improve the performance of indirect branch translation, which allows the performance to be on par with thread-private hash tables while having superior memory scalability.
DIGITAL FX!32: Combining Emulation and Binary Translation
TLDR
DIGITAL FX!32 software combines emulation and binary translation to provide fast, transparent execution of Intel x86 applications on Alpha systems, making hundreds of new applications available on Alpha-based platforms running the Windows NT operating system.
StarDBT: An Efficient Multi-platform Dynamic Binary Translation System
TLDR
For Windows applications that are typically multi-threaded GUI-based interactive applications with large code footprint, the StarDBT system provides acceptable performance in many cases, however, there are important scenarios in which dynamic translation still incurs significant runtime overhead, raising issues for further research.
Hardware support for control transfers in code caches
  • Ho-Seop Kim, James E. Smith
  • Computer Science
    Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36.
  • 2003
TLDR
This work analyzes several key aspects of superblock chaining and finds that a conventional baseline code cache with software jump target prediction results in 14.6% IPC loss versus the original binary, and introduces a modified software prediction technique that reduces the IPC losses to 11.4%.
Lightweight Memory Tracing
TLDR
The software-only approach enables memory tracing for unmodified, binary-only ×86 applications using the ×64 extension that is available in current CPUs; no OS extensions or special hardware is required.
PA-RISC to IA-64: Transparent Execution, No Recompilation
TLDR
To help PA-RISC (precision architecture-reduced instruction set computing) users migrate to its upcoming IA-64 systems, Hewlett-Packard has developed the Aries software emulator, combining fast interpretation and static translation.
Generating low-overhead dynamic binary translators
TLDR
This paper uses fastBT, a table-based dynamic binary translator that uses a code cache and various optimizations for indirect control transfers to illustrate the design tradeoffs in binary translators, and presents an analysis of the most challenging sources of overhead.
HDTrans: a low-overhead dynamic translator
TLDR
HDTrans is presented, a light-weight IA-32 to IA- 32 binary translation system that uses some simple and effective translation techniques in combination with established trace linearization and code caching optimizations, and an analysis of the effectiveness of post-compile static pre-translation techniques for overhead reduction.
The Transmeta Code Morphing/spl trade/ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges
  • J. Dehnert, B. Grant, J. Mattson
  • Computer Science
    International Symposium on Code Generation and Optimization, 2003. CGO 2003.
  • 2003
TLDR
The Crusoe paradigm of aggressive speculation, recovery to a consistent x86 state using unique hardware commit-and-rollback support, and adaptive retranslation when exceptions occur too often to be handled efficiently by interpretation are presented.
...
1
2
3
...