• Corpus ID: 9498408

Robust CNN-based speech recognition with Gabor filter kernels

@inproceedings{Chang2014RobustCS,
  title={Robust CNN-based speech recognition with Gabor filter kernels},
  author={Shuo-yiin Chang and Nelson Morgan},
  booktitle={INTERSPEECH},
  year={2014}
}
As has been extensively shown, acoustic features for speech recognition can be learned from neural networks with multiple hidden layers. [] Key Method In this architecture, a variety of Gabor features served as the multiple feature maps of the convolutional layer. The filter coefficients are further tuned by back-propagation training. Experiments used two noisy versions of the WSJ corpus: Aurora 4, and RATS re-noised WSJ. In both cases, the proposed architecture performs better than other noise-robust…

Figures and Tables from this paper

Improving Audio-Visual Speech Recognition Using Gabor Recurrent Neural Networks
TLDR
The experimental results show that the deep Gabor (LSTM-BRNNs)-based model achieves superior performance when compared to the (GMM-HMM)-based models which utilize the same front-ends.
An analysis of convolutional neural networks for speech recognition
  • J. Huang, Jinyu Li, Y. Gong
  • Computer Science
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
TLDR
By visualizing the localized filters learned in the convolutional layer, it is shown that edge detectors in varying directions can be automatically learned and it is established that the CNN structure combined with maxout units is the most effective model under small-sizing constraints for the purpose of deploying small-footprint models to devices.
Gabor Filter Incorporated CNN for Compression
  • Akihiro Imamura, N. Arizumi
  • Computer Science
    2021 36th International Conference on Image and Vision Computing New Zealand (IVCNZ)
  • 2021
TLDR
Gabor filters are incorporated in the earlier layers of CNNs for compression and shown that the first layer of VGG-16 for CIFAR-10 has 192 kernels/features, but learning Gabor filters requires an average of 29.4 kernels.
Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition
TLDR
The theory, training algorithm, and detailed analysis of learned filterbank show that the proposed unsupervised learning model based on convolutional restricted Boltzmann machine (RBM) with rectified linear units performs better than traditional MFCC and Mel-filterbank features for both clean and multicondition automatic speech recognition (ASR) tasks.
Increasing the robustness of CNN acoustic models using ARMA spectrogram features and channel dropout
TLDR
This work proposes an improved version of input dropout, which exploits the special structure of the input time-frequency representation, and replaced the standard mel-spectrogram input representation with the autoregressive moving average (ARMA) spectrogram, which was recently shown to outperform the former under mismatched train-test conditions.
Speech Recognition in Noisy Environments with Convolutional Neural Networks
TLDR
The use of convolutional neural networks (CNN) as acoustic models in automatic speech recognition systems (ASR) is proposed as an alternative to the classical recognition methods based on HMM without any noise-robust method applied.
Convolutional neural networks for acoustic modeling of raw time signal in LVCSR
TLDR
It is shown that the performance gap between DNNs trained on spliced hand-crafted features and DNN's trained on raw time signal can be strongly reduced by introducing 1D-convolutional layers.
Gabor filter assisted energy efficient fast learning Convolutional Neural Networks
TLDR
This work reduces the training complexity of CNNs by replacing certain weight kernels of a CNN with Gabor filters, which creates a balanced system that gives better training performance in terms of energy and time, compared to the standalone CNN (without any Gabor kernels), in exchange for tolerable accuracy degradation.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 28 REFERENCES
Informative spectro-temporal bottleneck features for noise-robust speech recognition
TLDR
This work improves PNS-Gabor MLP in two ways: first, informative Gabor features are selected using sparse principle component analysis (sparse PCA) before tandem processing, and second, a deep neural network (DNN) with bottleneck structure is used.
Exploring convolutional neural network structures and optimization techniques for speech recognition
TLDR
This paper investigates several CNN architectures, including full and limited weight sharing, convolution along frequency and time axes, and stacking of several convolution layers, and develops a novel weighted softmax pooling layer so that the size in the pooled layer can be automatically learned.
Deep convolutional neural networks for LVCSR
TLDR
This paper determines the appropriate architecture to make CNNs effective compared to DNNs for LVCSR tasks, and explores the behavior of neural network features extracted from CNNs on a variety of LVCSS tasks, comparing CNNs toDNNs and GMMs.
Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
TLDR
An amplitude modulation feature derived from Teager's nonlinear energy operator is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features that demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features.
An investigation of deep neural networks for noise robust speech recognition
TLDR
The noise robustness of DNN-based acoustic models can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation and can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Localized spectro-temporal features for automatic speech recognition
TLDR
It is argued here, that including LSTF streams provides another step towards human-like speech recognition, as well as evidence for (spectro-)temporal processing in the auditory system.
Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition
TLDR
A noise adaptive training (NAT) algorithm that can be applied to all training data that normalizes the environmental distortion as part of the model training that learns pseudo-clean model parameters that are later used with vector Taylor series model adaptation for decoding noisy utterances at test time.
Tandem connectionist feature extraction for conventional HMM systems
  • H. Hermansky, D. Ellis, Sangita Sharma
  • Computer Science
    2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
  • 2000
TLDR
A large improvement in word recognition performance is shown by combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling.
Gradient-based learning applied to document recognition
TLDR
This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
...
1
2
3
...