• Corpus ID: 221585948

Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware

@article{Blouw2020HardwareAT,
  title={Hardware Aware Training for Efficient Keyword Spotting on General Purpose and Specialized Hardware},
  author={Peter Blouw and Gurshaant Singh Malik and Benjamin Morcos and Aaron R. Voelker and Chris Eliasmith},
  journal={ArXiv},
  year={2020},
  volume={abs/2009.04465}
}
Keyword spotting (KWS) provides a critical user interface for many mobile and edge applications, including phones, wearables, and cars. As KWS systems are typically 'always on', maximizing both accuracy and power efficiency are central to their utility. In this work we use hardware aware training (HAT) to build new KWS neural networks based on the Legendre Memory Unit (LMU) that achieve state-of-the-art (SotA) accuracy and low parameter counts. This allows the neural network to run efficiently… 

Figures and Tables from this paper

Estimating Levels of Engagement for Social Human-Robot Interaction using Legendre Memory Units
TLDR
The results showed that identifying untrained classes after training on the extremes is feasible, particularly when using the Legendre Delay Network.
TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices
TLDR
TinySpeech is introduced, low-precision deep neural networks comprising largely of attention condensers tailored for on-device speech recognition using a machine-driven design exploration strategy, with one tailored specifically with microcontroller operation constraints.
Continuous then discrete: A recommendation for building robotic brains
TLDR
This document argues that there are benefits to be gained by designing learning algorithms that exist in continuous time, as well as state, and only after discretizing the algorithms for implementation on traditional com5 puting models, or mapping them directly onto analog hardware.
Efficient Neuromorphic Signal Processing with Loihi 2
TLDR
This work showcases advanced spiking neuron models that can be used to efficiently process streaming data in simulation experiments on emulated Loihi 2 hardware and describes an algorithm for optical flow estimation using spatiotemporal RF neurons that requires over 90x fewer operations than a conventional DNN-based solution.
Hardware Acceleration for Embedded Keyword Spotting: Tutorial and Survey
TLDR
This article extensively survey the different approaches taken by the recent state-of-the-art SotA at the algorithmic, architectural, and circuit level to enable KWS tasks in edge, devices to explore and guide the reader through the design of KWS systems.
Parallelizing Legendre Memory Unit Training
TLDR
The linear time-invariant (LTI) memory component of the LMU is leveraged to construct a simplified variant that can be parallelized during training (and yet executed as an RNN during inference), thus overcoming a well known limitation of training RNNs on GPUs.

References

SHOWING 1-10 OF 23 REFERENCES
Laika: A 5uW Programmable LSTM Accelerator for Always-on Keyword Spotting in 65nm CMOS
TLDR
The implementation of a KWS system using an LSTM accelerator designed in 65nm CMOS is presented, showing a power consumption of less than 5µW for real-time KWS applications and approximate computing techniques further reduce power consumption, while maintaining high accuracy and reliability.
14.1 A 510nW 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS
TLDR
Ultra-low power is a strong requirement for always-on speech interfaces in wearable and mobile devices, such as Voice Activity Detection (VAD) and Keyword Spotting (KWS) and high compute and memory requirements have preventedAlways-on KWS chips from operating in the $\mathrm{sub}-\mu \mathrm {W}$ range.
Vocell: A 65-nm Speech-Triggered Wake-Up SoC for 10- $\mu$ W Keyword Spotting and Speaker Verification
TLDR
This article presents a complete mixed-signal system-on-chip, capable of directly interfacing to an analog microphone and performing keyword spotting (KWS) and speaker verification (SV), without any need for further external accesses.
Small-Footprint Keyword Spotting with Multi-Scale Temporal Convolution
TLDR
This paper proposes a multi-branch temporal convolution module (MTConv), a CNN block consisting of multiple temporal Convolution filters with different kernel sizes, which enriches temporal feature space, and proposes a temporal efficient neural network (TENet) designed for KWS system.
Always-On, Sub-300-nW, Event-Driven Spiking Neural Network based on Spike-Driven Clock-Generation and Clock- and Power-Gating for an Ultra-Low-Power Intelligent Device
TLDR
A novel SNN classifier architecture for always-on functions is presented, demonstrating sub-300nW power consumption at the competitive inference accuracy for a KWS and otherAlways-on classification workloads.
SRAM for Error-Tolerant Applications With Dynamic Energy-Quality Management in 28 nm CMOS
TLDR
A voltage-scaled SRAM for both error-free and error-tolerant applications is presented that dynamically manages the energy/quality trade-off based on application need and two variation-resilient techniques are selectively applied to bit positions having larger impact on the overall quality.
A 65 nm 1.0 V 1.84 ns Silicon-on-Thin-Box (SOTB) embedded SRAM with 13.72 nW/Mbit standby power for smart IoT
TLDR
A 65-nm Silicon-on-Thin-Box (SOTB) embedded SRAM with back-bias control in the sleep mode and up to 20% active read power reduction is achieved by using proposed localized adoptive wordline width control.
How to Achieve World-Leading Energy Efficiency using 22FDX with Adaptive Body Biasing on an Arm Cortex-M4 IoT SoC
TLDR
Racyics’ adaptive body biasing enables LVT devices at 0.50V ULV conditions, to meet the tough leakage power constraints of IoT devices.
Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
TLDR
Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales and exceed state-of-the-art performance among RNNs on permuted sequential MNIST.
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
TLDR
An audio dataset of spoken words designed to help train and evaluate keyword spotting systems and suggests a methodology for reproducible and comparable accuracy metrics for this task.
...
1
2
3
...