Processor virtualization and split compilation for heterogeneous multicore embedded systems

@article{Cohen2010ProcessorVA,
  title={Processor virtualization and split compilation for heterogeneous multicore embedded systems},
  author={Albert Cohen and Erven Rohou},
  journal={Design Automation Conference},
  year={2010},
  pages={102-107}
}
Embedded multiprocessors have always been heterogeneous, driven by the power-efficiency and compute-density of hardware specialization. We aim to achieve portability and sustained performance of complete applications, leveraging diverse programmable cores. We combine instruction-set virtualization with just-in-time compilation, compiling C, C++ and managed languages to a target-independent intermediate language, maximizing the information flow between compilation steps in a split optimization… 

Figures and Tables from this paper

Process-level virtualization for runtime adaptation of embedded software

  • Kim M. Hazelwood
  • Computer Science
    2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC)
  • 2011
Some of the existing dynamic binary modification tools that can be used to perform runtime adaptation of embedded software are described, and the challenges of balancing memory overheads and performance when developing these tools for embedded platforms are discussed.

Hardware acceleration for Just-In-Time compilation on heterogeneous embedded systems

This paper proposes a solution based on a dedicated processor with specialized instructions for critical functions to improve efficiency of code generators and shows a 15% overall speedup on code generator's execution time based on the LLVM framework.

Combining Processor Virtualization and Component-Based Engineering in C for Many-Core Heterogeneous Embedded MPSoCs

A programming model based on C for developing fine grain component-based applications and a toolset that compiles them into a processor-independent bytecode representation that can be deployed on heterogeneous MPSoCs are presented.

Hardware Acceleration of Red-Black Tree Management and Application to Just-In-Time Compilation

This paper presents a performance analysis of different JIT compilation technologies in order to identify hardware and software optimization opportunities, and proposes a solution based on a dedicated processor with specialized instructions for critical functions of JIT compilers.

Full-virtualization on MIPS-based MPSOCs embedded platforms with real-time support

This paper presents an embedded hypervisor designed to provide full-virtualization and real-time execution of applications, running on a lightweight MIPS-based MPSOC platform improved to provide hardware-based virtualization.

Boosting Single Thread Performance in Mobile Processors via Reconfigurable Acceleration

This paper presents the design of an architecture with ‘general purpose' accelerators, reconfigured on an application-by-application basis, and evaluates the cost/performance implications of the design.

Adding virtualization support in MIPS 4Kc-based MPSoCs

This work proposes the adoption of full-virtualization for MPSoCs, where no guest OS changes are required and, in order to reduce known virtualization overheads, proposes some hardware modifications to a MIPS-based architecture.

Hardware-assisted virtualization targeting MIPS-based SoCs

This paper detail how to adapt an existing MIPS-based architecture aiming to support the virtualization principles and results demonstrating its correctness and efficiency are presented.

Hardware virtualization-driven software task switching in reconfigurable multi-processor system-on-chip architectures

This work presents an approach for virtualization-driven mapping and switching of software tasks for embedded multi-processor System-on-Chips (MPSoCs) by introducing a dynamically reconfigurable interconnection network based on permutation networks inside this Virtualization Middleware.

Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator

An industry-strength, LLVM-based parallel DBT implementing the ARCompact ISA is evaluated against three benchmark suites and speedups of up to 2.08 on a standard quad-core Intel Xeon machine are demonstrated.

References

SHOWING 1-10 OF 58 REFERENCES

A parallel dynamic compiler for CIL bytecode

This work proposes an approach that leverages on CMP features to expose a novel pipeline synchronization model for the internal threads of the dynamic compiler, ILDJIT, and is able to achieve significant speedups with respect to the baseline, when the underlying hardware exposes at least two cores.

Split Compilation: an Application to Just-in-Time Vectorization

This work focuses on automatic vectorization, a key optimization playing an increasing role in modern, power-efficient architectures, through a semantically rich and performance-friendly intermediate format.

Memory-efficient multithreaded code generation from Simulink for heterogeneous MPSoC

A functional modelingstyle to capture data-intensive and control-dependent target applications, and a system architecture modeling style to seamlessly transform the functional model into the target architecture are proposed using Simulink.

The java hotspot TM server compiler

The Java HotSpotTM Server Compiler achieves improved asymptotic performance through a combination of object-oriented and classical-compiler optimizations. Aggressive inlining using class-hierarchy

An Experimental Environment Validating the Suitability of CLI as an Effective Deployment Format for Embedded Systems

An experimental framework based on GCC is presented that validates the choice of CLI as a suitable processor-independent deployment format and offers a full development flow for the C language, generating a subset of pure CLI that does not require any virtual machine support other than a JIT compiler.

On the complexity of spill everywhere under SSA form

This paper provides an exhaustive study of the complexity of the "spill everywhere" problem in the context of the SSA form and identifies some polynomial cases but that are impractical in JIT context that can give hints to simplify formulations for the design of aggressive allocators.

Portable and Efficient Auto-vectorized Bytecode: a Look at the Interaction between Static and JIT Compilers

It is shown that vectorized bytecode is a viable approach that can deliver portable performance in the presence of SIMD extensions, while incurring only minor penalty when SIMD is not supported, and vectorization capabilities are added to the CLI port of the GCC compiler.

Split Register Allocation: Linear Complexity Without the Performance Penalty

A split register allocator is described showing that linear complexity does not imply reduced code quality, and a split compiler design is presented, where more expensive ahead-of-time analyses guide lightweight just-in-time optimizations.

Comparing the size of .NET applications with native code

  • Roberto CostaErven Rohou
  • Computer Science
    2005 Third IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'05)
  • 2005
The paper shows that the assumption of an impressive code size reduction is not reachable and it suggests that the adoption of such languages in embedded contexts be justified by additional arguments.

Trace-based just-in-time type specialization for dynamic languages

This work presents an alternative compilation technique for dynamically-typed languages that identifies frequently executed loop traces at run-time and then generates machine code on the fly that is specialized for the actual dynamic types occurring on each path through the loop.
...