• Corpus ID: 16489696

Rectifier Nonlinearities Improve Neural Network Acoustic Models

  title={Rectifier Nonlinearities Improve Neural Network Acoustic Models},
  author={Andrew L. Maas},
Deep neural network acoustic models produce substantial gains in large vocabulary continuous speech recognition systems. Emerging work with rectified linear (ReL) hidden units demonstrates additional gains in final system performance relative to more commonly used sigmoidal nonlinearities. In this work, we explore the use of deep rectifier networks as acoustic models for the 300 hour Switchboard conversational speech recognition task. Using simple training procedures without pretraining… 

Figures and Tables from this paper

Nonlinear activations for convolutional neural network acoustic models

This work compares the per-frame state classification accuracy of several popular nonlinear activation functions for a small-scale CNN and finds that the leaky rectified linear unit and soft-plus function perform best by far, which suggests their potential in full-scaleCNN acoustic models.

Deep neural networks with linearly augmented rectifier layers for speech recognition

  • L. Tóth
  • Computer Science
    2018 IEEE 16th World Symposium on Applied Machine Intelligence and Informatics (SAMI)
  • 2018
This work combines the two approaches and proposes the very simple technique of composing the layers of the network both from rectified and linear neurons, which performs equivalently or slightly better than a maxout network when trained on a larger data set, while it is computationally simpler.

Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition

The results show that with sufficient training data, increasing DNN model size is an effective, direct path to performance improvements, and even smaller DNNs benefit from a larger training corpus.

On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models

This study describes how successive nonlinear transformations are applied to the feature space non-uniformly when a deep neural network model learns categorical boundaries, which may partly explain their superior performance in pattern classification applications.

Improving deep neural network acoustic models using generalized maxout networks

This paper introduces two new types of generalized maxout units, which they are called p-norm and soft-maxout, and presents a method to control that instability during training when training unbounded-output nonlinearities.

Deep maxout neural networks for speech recognition

Experimental results demonstrate that max out networks converge faster, generalize better and are easier to optimize than rectified linear networks and sigmoid networks, and experiments show that maxout networks reduce underfitting and are able to achieve good results without dropout training.

Parameterised sigmoid and reLU hidden activation functions for DNN acoustic modelling

This paper investigates generalised forms of both Sigmoid and ReLU with learnable parameters, as well as their integration with the standard DNN acoustic model training process, and results in an average of 3.4% and 2.0% relative word error rate (WER) reduction with Sigmoids and Re LU parameterisations.

Building DNN acoustic models for large vocabulary speech recognition

Investigation of parametric rectified linear units for noise robust speech recognition

PReLU is a generalized version of LReLU where the gradient is learned adaptively from the training data and gives slightly better Word Error Rates (WERs) on noisy test sets compared to ReLU.

Convolutional deep maxout networks for phone recognition

Phone recognition tests on the TIMIT database show that switching to maxout units from rectifier units decreases the phone error rate for each network configuration studied, and yields relative error rate reductions of between 2% and 6%.



On rectified linear units for speech processing

This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units.

Improving deep neural networks for LVCSR using rectified linear units and dropout

Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.

Deep Neural Networks for Acoustic Modeling in Speech Recognition

This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.

Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks.

This paper argues that the improved accuracy achieved by the DNNs is the result of their ability to extract discriminative internal representations that are robust to the many sources of variability in speech signals, and shows that these representations become increasingly insensitive to small perturbations in the input with increasing network depth.

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization

A distributed neural network training algorithm, based on Hessianfree optimization, that scales to deep networks and large data sets and yields relative reductions in word error rate of 7–13% over cross-entropy training with stochastic gradient descent on two larger tasks: Switchboard and DARPA RATS noisy Levantine Arabic.

Learning long-term dependencies with gradient descent is difficult

This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

ImageNet classification with deep convolutional neural networks

A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

Sequence-discriminative training of deep neural networks

Different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on a standard 300 hour American conversational telephone speech task.

The Kaldi Speech Recognition Toolkit

The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.