Framewise phoneme classification with bidirectional LSTM and other neural network architectures

@article{Graves2005FramewisePC,
  title={Framewise phoneme classification with bidirectional LSTM and other neural network architectures},
  author={Alex Graves and J{\"u}rgen Schmidhuber},
  journal={Neural networks : the official journal of the International Neural Network Society},
  year={2005},
  volume={18 5-6},
  pages={
          602-10
        }
}
  • A. Graves, J. Schmidhuber
  • Published 1 June 2005
  • Computer Science
  • Neural networks : the official journal of the International Neural Network Society

Figures and Tables from this paper

Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition
TLDR
In this paper, two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory networks are carried out and it is found that a hybrid BLSTM-HMM system improves on an equivalent traditional HMM system.
Keyword Spotting with Long Short-term Memory Neural Network Architectures
TLDR
An improved residual LSTM model, in which a spatial shortcut path connected from lower layers to the output of memory is added, is put forward in this work, and it is presented that LSTMP brought quick convergence to various L STM models without decrease in accuracy.
Persian phoneme recognition using long short-term memory neural network
  • M. Daneshvar, H. Veisi
  • Computer Science
    2016 Eighth International Conference on Information and Knowledge Technology (IKT)
  • 2016
TLDR
This paper applies Long Short-Term Memory (LSTM) network on Persian phoneme recognition and finds that both LSTM and deep L STM outperforms HMM in Persian phonemic recognition.
A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding
TLDR
This work proposes to use BLSTM-RNN for a unified tagging solution that can be applied to various tagging tasks including part-of-speech tagging, chunking and named entity recognition, requiring no task specific knowledge or sophisticated feature engineering.
LSTM-Based Language Models for Spontaneous Speech Recognition
TLDR
LSTM-LMs trained with regularization to rescore the recognition word lattices and obtained much lower WER as compared to the n-gram and conventional RNN-based LMs for the Russian and English languages.
Analysis of memory in LSTM-RNNs for source separation
TLDR
A memory reset approach is applied to the task of multi-speaker source separation and finds a strong performance effect of short-term linguistic processes and confirms that performance-wise it is sufficient to implement longer memory in deeper layers.
Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks
TLDR
This paper proposes a sequence-based conversion method using DBLSTM-RNNs to model not only the frame-wised relationship between the source and the target voice, but also the long-range context-dependencies in the acoustic trajectory.
Echo State vs. LSTM Networks for Word Sense Disambiguation
TLDR
The main advantages of BiESN over BiLSTM are the smaller number of trainable parameters and a simpler training algorithm and the two modelling approaches have been compared on the word sense disambiguation task (WSD) in NLP.
Multi-task learning of structured output layer bidirectional LSTMS for speech synthesis
TLDR
This work improves the conventional BLSTM-RNN based approach by introducing a multi-task learned structured output layer where spectral parameter targets are conditioned upon pitch parameters prediction.
LSTM Neural Networks for Language Modeling
TLDR
This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Framewise phoneme classification with bidirectional LSTM networks
  • A. Graves, J. Schmidhuber
  • Computer Science
    Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005.
  • 2005
TLDR
It is found that bidirectional LSTM outperforms both RNNs and unidirectionalLSTM, and the significance of framewise phoneme classification to continuous speech recognition and the validity of usingbidirectional networks for online causal tasks is discussed.
Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition
TLDR
In this paper, two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory networks are carried out and it is found that a hybrid BLSTM-HMM system improves on an equivalent traditional HMM system.
Rapid Retraining on Speech Data with LSTM Recurrent Networks.
TLDR
This report partitions the TIDIGITS database into utterances spoken by men, women, boys and girls, and successively retrain a Long Short Term Memory (LSTM) RNN on them, and finds that the network rapidly adapts to new subsets of the data, and achieves greater accuracy than when trained on them from scratch.
Biologically Plausible Speech Recognition with LSTM Neural Nets
TLDR
It is concluded that LSTM should be further investigated as a biologically plausible basis for a bottom-up, neural net-based approach to speech recognition.
Phoneme boundary estimation using bidirectional recurrent neural networks and its applications
TLDR
Experimental results showed that the proposed method could estimate segment boundaries significantly better than an HMM or a multilayer perceptron-based method, and incorporated the BRNN-based segment boundary estimator into the HMM-based and segment model-based recognition systems.
Bidirectional recurrent neural networks
TLDR
It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
A comparison between spiking and differentiable recurrent neural networks on spoken digit recognition
TLDR
It is found that LSTM gives greatly superior results to an SNN found in the literature, and it is concluded that the architecture has a place in domains that require the learning of large timewarped datasets, such as automatic speech recognition.
Learning Precise Timing with LSTM Recurrent Networks
TLDR
This work finds that LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes spaced either 50 or 49 time steps apart without the help of any short training exemplars.
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Experiments on the implementation of recurrent neural networks for speech phone recognition
  • Ruxin Chen, L. Jamieson
  • Computer Science
    Conference Record of The Thirtieth Asilomar Conference on Signals, Systems and Computers
  • 1996
TLDR
An extensive set of experiments that explore training methods and criteria for recurrent neural networks (RNNs) used for speech phone recognition and proposes a new criterion function that allows direct minimization of the frame error rate.
...
1
2
3
4
...