Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
@inproceedings{Ramsay2018LowDimensionalBF, title={Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition}, author={David B. Ramsay and Kevin Kilgour and Dominik Roblek and Matthew Sharifi}, booktitle={Interspeech}, year={2018} }
Low power digital signal processors (DSPs) typically have a very limited amount of memory in which to cache data. In this paper we develop efficient bottleneck feature (BNF) extractors that can be run on a DSP, and retrain a baseline large-vocabulary continuous speech recognition (LVCSR) system to use these BNFs with only a minimal loss of accuracy. The small BNFs allow the DSP chip to cache more audio features while the main application processor is suspended, thereby reducing the overall…
2 Citations
On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification
- Computer ScienceSSRN Electronic Journal
- 2022
This paper systematically study the impact of training targets, activation functions, and loss functions on the performance of TD-SV, and experimentally shows that GELU is able to reduce the error rates of TD -SV significantly compared to sigmoid, irrespective of training target.
Feed-Forward Deep Neural Network (FFDNN)-Based Deep Features for Static Malware Detection
- Computer ScienceInternational Journal of Intelligent Systems
- 2023
The portable executable header (PEH) information is commonly used as a feature for malware detection systems to train and validate machine learning (ML) or deep learning (DL) classifiers to extract deep features through hidden layers of a feed-forward deep neural network (FFDNN).
14 References
A Fixed-Point Neural Network Architecture for Speech Applications on Resource Constrained Hardware
- Computer ScienceJ. Signal Process. Syst.
- 2018
This paper designs low cost neural network architectures for keyword detection and speech recognition usingresent techniques to reduce memory requirement by scaling down the precision of weight and biases without compromising on the detection/recognition performance.
Compression of End-to-End Models
- Computer ScienceINTERSPEECH
- 2018
This work explores the problem of compressing end-to-end models with the goal of satisfying device constraints without sacrificing model accu-racy and evaluates matrix factorization, knowledge distillation, and parameter sparsity to determine the most effective methods given constraints such as a parameter budget.
Now Playing: Continuous low-power music recognition
- Computer ScienceArXiv
- 2017
A low-power music recognizer that runs entirely on a mobile device and automatically recognizes music without user interaction is presented, which respects user privacy by running entirely on-device and can passively recognize a wide range of music.
On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
This work presents a technique for general recurrent model compression that jointly compresses both recurrent and non-recurrent inter-layer weight matrices and finds that the proposed technique allows us to reduce the size of the authors' Long Short-Term Memory (LSTM) acoustic model to a third of its original size with negligible loss in accuracy.
Convolutive Bottleneck Network features for LVCSR
- Business2011 IEEE Workshop on Automatic Speech Recognition & Understanding
- 2011
A Convolutive Bottleneck Network is proposed as extension of the current state-of-the-art Universal Context Network and leads to 5.5% relative reduction of WER, compared to the Universal Context ANN baseline.
State-of-the-Art Speech Recognition with Sequence-to-Sequence Models
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
A variety of structural and optimization improvements to the Listen, Attend, and Spell model are explored, which significantly improve performance and a multi-head attention architecture is introduced, which offers improvements over the commonly-used single- head attention.
Lower Frame Rate Neural Network Acoustic Models
- Computer ScienceINTERSPEECH
- 2016
On a large vocabulary Voice Search task, it is shown that with conventional models, one can slow the frame rate to 40ms while improving WER by 3% relative over a CTC-based model, thus improving overall system speed.
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
- Computer Science2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2016
We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional…
Librispeech: An ASR corpus based on public domain audio books
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself.
Connectionist Temporal Classification
- Computer Science
- 2012
Experiments on speech and handwriting recognition show that a BLSTM network with a CTC output layer is an effective sequence labeller, generally outperforming standardHMMsandHMM-neural network hybrids, as well asmore recent sequence labelling algorithms such as large margin HMMs and conditional random fields.