An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

  title={An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning},
  author={Hirohisa Watanabe and Mineto Tsukada and Hiroki Matsutani},
  journal={2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)},
DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNS require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning… 

Figures and Tables from this paper

A Packet Routing using Lightweight Reinforcement Learning Based on Online Sequential Learning

This work proposes OS-ELM QN (Q-Network), a lightweight machine learning algorithm with a prioritized experience replay buffer and multi-agent learning function to improve the learning performance and is compared to a deep reinforcement learning based packet routing method using a network simulator.

A Survey of Domain-Specific Architectures for Reinforcement Learning

FPGA-based implementations are the focus of this work, but GPU-based approaches are considered as well, and possible areas for future work are suggested, based on the preceding discussion of existing architectures.

Efficient Compressed Ratio Estimation using Online Sequential Learning for Edge Computing

This study developed an efficient RL method for edge devices, referred to as the actor--critic online sequential extreme learning machine (AC-OSELM), and a system to compress data by estimating an appropriate compression ratio on the edge using AC- OSELM.

Performance improvement of reinforcement learning algorithms for online 3D bin packing using FPGA

This paper uses FPGA as a hardware accelerator to reduce inference time of DQN as well as its pre-/post-processing steps, which allows the optimised algorithm to cover the entire search space within the given time constraints.

TD3lite: FPGA Acceleration of Reinforcement Learning with Structural and Representation Optimizations

To address the resource and computational overhead due to inference and training of the multiple neural networks of TD3, this work proposes TD3lite, an integrated approach consisting of a network sharing technique combined with bitwidth-optimized block floating-point arithmetic.

A Hardware Implementation for Deep Reinforcement Learning Machine

This paper proposes a hardware architecture to implement the DQN algorithm, suitable for real-time applications, and its main features are low power and suitable for limited hardware resources.

FAQ: A Flexible Accelerator for Q-Learning with Configurable Environment

  • Marc RothmannMario Porrmann
  • Computer Science
    2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)
  • 2022
Reinforcement Learning is an area of machine learning that is concerned with optimizing the behavior of an agent in an environment by maximizing cumulative rewards. This can be done with classical

E2HRL: An Energy-efficient Hardware Accelerator for Hierarchical Deep Reinforcement Learning

The proposed Energy Efficient Hierarchical Reinforcement Learning (E2HRL), which is a scalable hardware architecture for RL applications, utilizes a cross-layer design methodology for achieving better energy efficiency, smaller model size, higher accuracy, and system integration at the software and hardware layers.

Machine Learning for the Control and Monitoring of Electric Machine Drives: Advances and Trends

This review paper systematically summarizes the existing literature on utilizing machine learning techniques for the control and monitoring of electric machine drives and provides some outlook toward promoting its widespread application in the industry with a focus on deploying ML algorithms onto embedded system-on-chip (SoC) field-programmable gate array (FPGA) devices.

Binarized P-Network: Deep Reinforcement Learning of Robot Control from Raw Images on FPGA

This letter proposes a novel DRL algorithm called Binarized P-Network (BPN), which learns image-input control policies using Binarization Convolutional Neural Networks (BCNNs), and adopts a robust value update scheme called Conservative Value Iteration, which is tolerant of function approximation errors.

A Neural Network-Based On-Device Learning Anomaly Detector for Edge Devices

Experiments show that ONLAD has favorable anomaly detection capability in an environment that simulates concept drift, andONLAD Core realizes on-device learning for edge devices at low power consumption, which realizes standalone execution where data transfers between edge and server are not required.

Adam: A Method for Stochastic Optimization

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

An Area-Efficient Implementation of Recurrent Neural Network Core for Unsupervised Anomaly Detection

Echo State Network (ESN) is analyzed, which is a simple form of Recurrent Neural Networks (RNNs), and its area-efficient implementation is evaluated in terms of the anomaly detection capability and area.

On robust estimation of the location parameter

Spectral Normalization for Generative Adversarial Networks

This paper proposes a novel weight normalization technique called spectral normalization to stabilize the training of the discriminator and confirms that spectrally normalized GANs (SN-GANs) is capable of generating images of better or equal quality relative to the previous training stabilization techniques.

Regularized online sequential learning algorithm for single-hidden layer feedforward neural networks

Spectral Norm Regularization for Improving the Generalizability of Deep Learning

This work proposes a simple and effective regularization method, referred to as spectral norm regularization, which penalizes the high spectral norm of weight matrices in neural networks, which exhibits better generalizability than other baseline methods.

Reinforcement learning for robots using neural networks

This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.

Multilayer feedforward networks are universal approximators

Human-level control through deep reinforcement learning

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.