ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning

@article{Giri2020ESP4MLPD,
  title={ESP4ML: Platform-Based Design of Systems-on-Chip for Embedded Machine Learning},
  author={Davide Giri and Kuan-Lin Chiu and Giuseppe Di Guglielmo and Paolo Mantovani and Luca P. Carloni},
  journal={2020 Design, Automation \& Test in Europe Conference \& Exhibition (DATE)},
  year={2020},
  pages={1049-1054}
}
We present ESP4ML, an open-source system-level design flow to build and program SoC architectures for embedded applications that require the hardware acceleration of machine learning and signal processing algorithms. We realized ESP4ML by combining two established open-source projects (ESP and HLS4ML) into a new, fully-automated design flow. For the SoC integration of accelerators generated by HLS4ML, we designed a set of new parameterized interface circuits synthesizable with high-level… 

Figures and Tables from this paper

Ariane + NVDLA: Seamless Third-Party IP Integration with ESP
TLDR
This work adds support for the seamless integration of third-party accelerators by developing a new type of interface that retains the benefits of the ESP platform services and allows designers to rapidly prototype complex SoC architectures on FPGAs with a push-button design flow.
MasterMind: Many-Accelerator SoC Architecture for Real-Time Brain-Computer Interfaces
TLDR
A Linux-supporting, RISC-V based SoC that integrates multiple hardware accelerators and a thorough design-space exploration at the accelerator level and at the SoC level to enable real-time communication with the brain.
Building Complete Heterogeneous Systems-on-Chip in C: From Hardware Accelerators to CPUs
TLDR
This work presents a methodology to generate entire heterogeneous SoCs in C and investigates the generation of processors and interfaces at the behavioral level as these are important parts of any SoCs, but have long been thought not to be efficiently synthesizable using HLS.
Towards a Modular RISC-V Based Many-Core Architecture for FPGA Accelerators
TLDR
The motivation behind this work is to introduce a modular cluster-based many-core architecture for FPGA accelerators that is re-usable and flexible tailored to implement different many- core taxonomies with less design time and costs by using regular and replicated sets of computing, memory, and interconnection blocks.
Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs
TLDR
Cohmeleon applies reinforcement learning to select the best coherence mode for each accelerator dynamically at runtime, as opposed to statically at design time, and it can match runtime solutions that are manually tuned for the target architecture.
Enabling Heterogeneous, Multicore SoC Research with RISC-V and ESP
TLDR
Modifications to ESP, an open-source SoC design platform, are presented to enable multicore execution with the RISC-V CVA6 processor, to enable Risc-V-based SoCs designed with ESP for FPGA to boot Linux SMP and execute multithreaded applications.
Accelerator Integration for Open-Source SoC Design
TLDR
This work presents a design flow for the seamless hardware and software integration of accelerators into a complete SoC and for its evaluation through rapid FPGA-based prototyping.
From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics
TLDR
This paper proposes an automated tool flow from a domain-specific language (DSL) to generate accelerators for computational fluid dynamics on FPGA that simplifies the exploration of parameters and constraints such as on-chip memory usage and a decoupled optimization of memory and logic resources.
Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics
TLDR
An automated tool flow from a domain-specific language (DSL) to generate massively-parallel accelerators on FPGA to address challenges of computational fluid dynamics (CFD), using the case of CFD as a paradigmatic example.
Agile SoC development with open ESP
TLDR
Conceived as a heterogeneous integration platform and tested through years of teaching at Columbia University, ESP supports the open-source hardware community by providing a flexible platform for agile SoC development.
...
...

References

SHOWING 1-10 OF 29 REFERENCES
DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family
TLDR
A design automation tool allowing the application developers to build from scratch learning accelerators that targets their specific NN models with custom configurations and optimized performance, and greatly simplifies the design flow of NN accelerators for the machine learning or AI application developers.
FPGA/DNN Co-Design: An Efficient Design Methodology for 1oT Intelligence on the Edge
TLDR
Results show that the proposed DNN model and accelerator outperform the state-of-the-art FPGA designs in all aspects including Intersection-over-Union (IoU) and energy efficiency.
High-level synthesis of accelerators in embedded scalable platforms
TLDR
Embedded scalable platforms combine a flexible socketed architecture for heterogeneous system-on-chip (SoC) design and a companion system-level design methodology that simplifies the design, integration, and programming of the heterogeneous components in the SoC.
Handling large data sets for high-performance embedded applications in heterogeneous systems-on-chip
TLDR
This work presents a solution to preserve the speedup of accelerators when scaling from small to large data sets by combining specialized DMA and address translation with a software layer in Linux.
Deep Neural Network Model and FPGA Accelerator Co-Design: Opportunities and Challenges
  • Cong Hao, Deming Chen
  • Computer Science
    2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)
  • 2018
TLDR
A simultaneous DNN and hardware accelerator co-design method to push the DNN performance on FPGAs is discussed and new ideas to further improve DNN development productivity and design quality are proposed.
NoC-Based Support of Heterogeneous Cache-Coherence Models for Accelerators
TLDR
This work proposes an extension of a standard directory-based cache-coherence protocol and presents its design as part of a scalable memory hierarchy implemented over a NoC, and designed a many-accelerator SoC architecture that can support three main cache- coherence models for accelerators: non-coherent, last-level-cache- coherent, and fully-co coherent.
A Survey and Evaluation of FPGA High-Level Synthesis Tools
  • R. Nane, V. Sima, K. Bertels
  • Computer Science
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
  • 2016
TLDR
This work uses a first-published methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources.
Invited: The case for Embedded Scalable Platforms
  • L. Carloni
  • Computer Science
    2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)
  • 2016
TLDR
Embedded Scalable Platforms are a novel approach to SoC design and programming that addresses design-complexity challenges by combining an architecture and a methodology that leverages compositional design-space exploration with high-level synthesis.
Machine learning on FPGAs to face the IoT revolution
TLDR
A series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency including the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training are presented.
An FPGA-based infrastructure for fine-grained DVFS analysis in high-performance embedded systems
TLDR
This work shows how an FPGA-based infrastructure can be used to first generate SoCs with loosely-coupled accelerators, and then perform design-space exploration considering several DVFS policies under full-system workload scenarios, sweeping spatial and temporal domain granularity.
...
...