Dataflow Support in x86_64 Multicore Architectures through Small Hardware Extensions

@article{Mondelli2015DataflowSI,
  title={Dataflow Support in x86\_64 Multicore Architectures through Small Hardware Extensions},
  author={Andrea Mondelli and Nam Ho and Alberto Scionti and Marco Solinas and Antoni Portero and Roberto Giorgi},
  journal={2015 Euromicro Conference on Digital System Design},
  year={2015},
  pages={526-529}
}
  • A. Mondelli, Nam Ho, +3 authors R. Giorgi
  • Published 26 August 2015
  • Computer Science
  • 2015 Euromicro Conference on Digital System Design
The path towards future high performance computers requires architectures able to efficiently run multi-threaded applications. In this context, dataflow-based execution models can improve the performance by limiting the synchronization overhead, thanks to a simple producer-consumer approach. This paper advocates the ISE of standard cores with a small hardware extension for efficiently scheduling the execution of threads on the basis of dataflow principles. A set of dedicated instructions allow… Expand
Bridging a Data-Flow Execution Model to a Lightweight Programming Model
  • R. Giorgi, Marco Procaccini
  • Computer Science
  • 2019 International Conference on High Performance Computing & Simulation (HPCS)
  • 2019
TLDR
This work proposes this API as a simple programming model in C language that can potentially permit an easy interface between DF-Threads and generic programming models and achieves better performance-per-core compared to OpenMPI and CUDA. Expand
Data-Driven Concurrency for High Performance Computing
TLDR
This work utilizes dynamic dataflow/data-driven techniques to improve the performance of high performance computing (HPC) systems and compares the proposed framework to MPI, DDM-VM, and OmpSs@Cluster to show that it obtains comparable or better performance. Expand
An FPGA-based Scalable Hardware Scheduler for Data-Flow Models
TLDR
A scheduler for Data-Flow threads implemented in reconfigurable logic for being deployed on Reconfigurable MPSoCs (i.e., Multi-Processing System on Chips with FPGA) and the Block Matrix Multiply benchmark is used to analyze the potentiality of the proposed model. Expand
Exploring dataflow-based thread level parallelism in cyber-physical systems
  • R. Giorgi
  • Computer Science
  • Conf. Computing Frontiers
  • 2016
TLDR
The preliminary results confirm the scalability of the AXIOM execution model and the related memory memory model, which is key for the execution of threads while reducing the need of data transfers. Expand
Analyzing the Impact of Operating System Activity of Different Linux Distributions in a Distributed Environment
TLDR
This paper shows the result analysis tool flow and the OS impact of different Linux distributions running on a distributed environment consisting of several nodes with a full OS and analyze key metrics like L2 cache miss rate, execution cycles, data access latency, and kernel cycles showing up to 60% performance variations among the different OS distributions. Expand
Chapter Two - Exploring Future Many-Core Architectures: The TERAFLUX Evaluation Framework
TLDR
In this chapter, different options for simulating a 1000 general-purpose-core system are explored and the setup that successfully allowed us to evaluate the authors' 1000 core target while running a full-system Linux operating system is shown. Expand
Energy Efficiency Exploration on the ZYNQ Ultrascale+
TLDR
This paper demonstrates a possible architecture based on DataFlow-Threads (DF- Threads), a novel execution model, on the Zynq Ultrascale+ platform, in order to assess the energy efficiency of DF-threads. Expand
Scalable Embedded Systems: Towards the Convergence of High-Performance and Embedded Computing
  • R. Giorgi
  • Computer Science
  • 2015 IEEE 13th International Conference on Embedded and Ubiquitous Computing
  • 2015
TLDR
This paper describes how to dynamically and efficiently distribute the computational threads in symbiosis with an appropriate memory model to allow the system scalability, so that the system can double the performance by simply connecting two boards. Expand
AXIOM: A Scalable, Efficient and Reconfigurable Embedded Platform
TLDR
The AXIOM project (Agile, eXtensible, fast I/O Module), presented in this paper, introduces a new hardware-software platform for CPS, which can provide an easy parallel programming model and fast connectivity, in order to scale-up performance by adding multiple boards. Expand

References

SHOWING 1-10 OF 31 REFERENCES
Enhancing an x86_64 multi-core architecture with data-flow execution support
TLDR
This paper proposes to augment cores with a minimalistic set of hardware units and dedicated instructions that allow efficiently scheduling the execution of threads on the basis of data-flow principles. Expand
A scalable thread scheduling co-processor based on data-flow principles
TLDR
The integration of standard cores with dedicated co-processing units that enable the system to support a fine-grain data-flow execution model developed within the TERAFLUX project is proposed and its capability of scaling with the increasing number of cores is demonstrated. Expand
DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems
TLDR
The SDF model is extended in order to be used in future scalable CMP systems where wire delay imposes to partition the design, and design choices are suggested to improve the scalability of the basic design. Expand
Simulating a Multi-core x8664 Architecture with Hardware ISA Extension Supporting a Data-Flow Execution Model
  • Nam Ho, A. Portero, +4 authors R. Giorgi
  • Computer Science
  • 2014 2nd International Conference on Artificial Intelligence, Modelling and Simulation
  • 2014
TLDR
A data-flow based execution model that exposes to the multi-core x8664 architecture up to millions of fine-grain threads and shows better scaling and smaller saturation when the number of workers increases. Expand
Architectural Support for Data-Driven Execution
TLDR
This work provides architectural support for data-driven execution for the Data-Driven Multithreading (DDM) model and its integration into a multicore system with eight cores on a Virtex-6 FPGA with negligible overheads. Expand
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
TLDR
Carbon is proposed, a hardware technique to accelerate dynamic task scheduling on scalable CMPs and delivers significant performance improvements over the best software scheduler: on average for 64 cores, 68% faster on a set of loop-parallel benchmarks, and 109% better on aset of task-par parallel benchmarks. Expand
Architectural Support for Fault Tolerance in a Teradevice Dataflow System
TLDR
This paper presents a fault tolerant architecture for a coarse-grained dataflow system, leveraging the inherent features of the dataflow execution model and provides methods to dynamically detect and manage permanent, intermittent, and transient faults during runtime. Expand
An Introduction to DF-Threads and their Execution Model
  • R. Giorgi, P. Faraboschi
  • Computer Science
  • 2014 International Symposium on Computer Architecture and High Performance Computing Workshop
  • 2014
TLDR
The idea of using the dataflow concept to define novel thread types that are called Data-Flow-Threads or DF- Threads, aimed at a more complete model with the way of managing the mutable shared state by relying on the transactional memory semantics. Expand
Accelerating Haskell on a Dataflow Architecture : a case study including Transactional Memory
A possible direction for exploiting the computational power of multi/many core chips is to rely on a massive usage of Thread Level Parallelism (TLP). We focus on the Decoupled Threaded Architecture,Expand
Transactional Memory on a Dataflow Architecture for Accelerating
Dataflow Architectures have been explored extensively in the past and are now re-evaluated from a different perspective as they can provide a viable solution to efficiently exploit multi/many coreExpand
...
1
2
3
4
...