# Rectifier Nonlinearities Improve Neural Network Acoustic Models

@inproceedings{Maas2013RectifierNI, title={Rectifier Nonlinearities Improve Neural Network Acoustic Models}, author={Andrew L. Maas}, year={2013} }

Deep neural network acoustic models produce substantial gains in large vocabulary continuous speech recognition systems. Emerging work with rectified linear (ReL) hidden units demonstrates additional gains in final system performance relative to more commonly used sigmoidal nonlinearities. In this work, we explore the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task. Using simple training procedures without pretraining…

## 5,006 Citations

### Nonlinear activations for convolutional neural network acoustic models

- Computer Science
- 2016

This work compares the per-frame state classification accuracy of several popular nonlinear activation functions for a small-scale CNN and finds that the leaky rectified linear unit and soft-plus function perform best by far, which suggests their potential in full-scaleCNN acoustic models.

### Deep neural networks with linearly augmented rectifier layers for speech recognition

- Computer Science2018 IEEE 16th World Symposium on Applied Machine Intelligence and Informatics (SAMI)
- 2018

This work combines the two approaches and proposes the very simple technique of composing the layers of the network both from rectified and linear neurons, which performs equivalently or slightly better than a maxout network when trained on a larger data set, while it is computationally simpler.

### Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

- Computer ScienceArXiv
- 2014

The results show that with sufficient training data, increasing DNN model size is an effective, direct path to performance improvements, and even smaller DNNs benefit from a larger training corpus.

### On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models

- Computer ScienceINTERSPEECH
- 2016

This study describes how successive nonlinear transformations are applied to the feature space non-uniformly when a deep neural network model learns categorical boundaries, which may partly explain their superior performance in pattern classification applications.

### Improving deep neural network acoustic models using generalized maxout networks

- Computer Science2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2014

This paper introduces two new types of generalized maxout units, which they are called p-norm and soft-maxout, and presents a method to control that instability during training when training unbounded-output nonlinearities.

### Deep maxout neural networks for speech recognition

- Computer Science2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013

Experimental results demonstrate that max out networks converge faster, generalize better and are easier to optimize than rectified linear networks and sigmoid networks, and experiments show that maxout networks reduce underfitting and are able to achieve good results without dropout training.

### Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling

- Computer ScienceINTERSPEECH
- 2015

This paper investigates generalised forms of both Sigmoid and ReLU with learnable parameters, as well as their integration with the standard DNN acoustic model training process, and results in an average of 3.4% and 2.0% relative word error rate (WER) reduction with Sigmoids and Re LU parameterisations.

### Building DNN acoustic models for large vocabulary speech recognition

- Computer ScienceComput. Speech Lang.
- 2017

### Investigation of parametric rectified linear units for noise robust speech recognition

- Computer ScienceINTERSPEECH
- 2015

PReLU is a generalized version of LReLU where the gradient is learned adaptively from the training data and gives slightly better Word Error Rates (WERs) on noisy test sets compared to ReLU.

### Convolutional deep maxout networks for phone recognition

- Computer ScienceINTERSPEECH
- 2014

Phone recognition tests on the TIMIT database show that switching to maxout units from rectifier units decreases the phone error rate for each network configuration studied, and yields relative error rate reductions of between 2% and 6%.

## References

SHOWING 1-10 OF 14 REFERENCES

### On rectified linear units for speech processing

- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013

This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units.

### Improving deep neural networks for LVCSR using rectified linear units and dropout

- Computer Science2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013

Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.

### Deep Neural Networks for Acoustic Modeling in Speech Recognition

- Computer Science
- 2012

This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.

### Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks.

- Computer ScienceICLR 2013
- 2013

This paper argues that the improved accuracy achieved by the DNNs is the result of their ability to extract discriminative internal representations that are robust to the many sources of variability in speech signals, and shows that these representations become increasingly insensitive to small perturbations in the input with increasing network depth.

### Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

- Computer ScienceIEEE Transactions on Audio, Speech, and Language Processing
- 2012

A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

### Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization

- Computer ScienceINTERSPEECH
- 2012

A distributed neural network training algorithm, based on Hessianfree optimization, that scales to deep networks and large data sets and yields relative reductions in word error rate of 7–13% over cross-entropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic.

### Learning long-term dependencies with gradient descent is difficult

- Computer ScienceIEEE Trans. Neural Networks
- 1994

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

### ImageNet classification with deep convolutional neural networks

- Computer ScienceCommun. ACM
- 2012

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

### Sequence-discriminative training of deep neural networks

- Computer ScienceINTERSPEECH
- 2013

Different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on a standard 300 hour American conversational telephone speech task.

### The Kaldi Speech Recognition Toolkit

- Computer Science
- 2011

The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.