Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

@article{Graves2006ConnectionistTC,
  title={Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks},
  author={Alex Graves and Santiago Fern{\'a}ndez and Faustino J. Gomez and J{\"u}rgen Schmidhuber},
  journal={Proceedings of the 23rd international conference on Machine learning},
  year={2006}
}
Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. [] Key Result An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

Figures from this paper

Training LDCRF model on unsegmented sequences using connectionist temporal classification
TLDR
Experimental results on two gesture recognition tasks show that the proposed method outperforms LDCRFs, hidden Markov models, and conditional random fields.
Supervised Sequence Labelling with Recurrent Neural Networks
  • A. Graves
  • Computer Science
    Studies in Computational Intelligence
  • 2008
TLDR
A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.
Temporal Attention-Gated Model for Robust Sequence Classification
TLDR
The Temporal Attention-Gated Model (TAGM) is presented which integrates ideas from attention models and gated recurrent networks to better deal with noisy or unsegmented sequences.
Modeling intra-label dynamics in connectionist temporal classification
TLDR
This paper proposes to model each label with a sequence of hidden sub-labels at the CTC level, which allows CTC to learn the intra-label relations which transfers part of the load of learning dynamical sequences from RNN to CTC.
Parallel sequence classification using recurrent neural networks and alignment
TLDR
A new model called class-less classifier is proposed, which is cognitive motivated by a simplified version of the infants learning, which not only learns the semantic association but also learns the relation between the labels and the classes.
Probabilistic asr feature extraction applying context-sensitive connectionist temporal classification networks
TLDR
In challenging ASR scenarios involving highly spontaneous, disfluent, and noisy speech, the BN-CTC front-end leads to remarkable word accuracy improvements and prevails over a series of previously introduced BLSTM-based ASR systems.
Sequence-discriminative training of recurrent neural networks
TLDR
It is shown that although recurrent neural networks already make use of the whole observation sequence and are able to incorporate more contextual information than feed forward networks, their performance can be improved with sequence-discriminative training.
Learning acoustic frame labeling for speech recognition with recurrent neural networks
  • H. Sak, A. Senior, J. Schalkwyk
  • Computer Science, Physics
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
TLDR
It is shown that a bidirectional LSTM RNN CTC model using phone units can perform as well as an LSTm RNN model trained with CE using HMM state alignments, and the effect of sequence discriminative training on these models is shown.
Segmental Recurrent Neural Networks
TLDR
Experiments on handwriting recognition and joint Chinese word segmentation/POS tagging show that segmental recurrent neural networks obtain substantially higher accuracies compared to models that do not explicitly represent segments.
End-to-End Sequence Labeling via Convolutional Recurrent Neural Network with a Connectionist Temporal Classification Layer
TLDR
A kind of novel deep neural network architecture which combines convolution, pooling and recurrent in a unified framework to construct the convolutional recurrent neural network (CRNN) for sequence labeling tasks with variable lengths of input and output is proposed.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 24 REFERENCES
Bidirectional recurrent neural networks
TLDR
It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
Learning Precise Timing with LSTM Recurrent Networks
TLDR
This work finds that LSTM augmented by "peephole connections" from its internal cells to its multiplicative gates can learn the fine distinction between sequences of spikes spaced either 50 or 49 time steps apart without the help of any short training exemplars.
Temporal classification: extending the classification paradigm to multivariate time series
TLDR
A temporal learner capable of producing comprehensible and accurate classifiers for multivariate time series that can learn from a small number of instances and can integrate non-temporal features, and a feature construction technique that parameterises sub-events of the training set and clusters them to construct features for a propositional learner.
Bidirectional LSTM Networks for Improved Phoneme Classification and Recognition
TLDR
In this paper, two experiments on the TIMIT speech corpus with bidirectional and unidirectional Long Short Term Memory networks are carried out and it is found that a hybrid BLSTM-HMM system improves on an equivalent traditional HMM system.
Connectionist Speech Recognition: A Hybrid Approach
From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous
Neural Networks: Tricks of the Trade
TLDR
It is shown how nonlinear semi-supervised embedding algorithms popular for use with â œshallowâ learning techniques such as kernel methods can be easily applied to deep multi-layer architectures.
Long Short-Term Memory
TLDR
A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Markovian Models for Sequential Data
TLDR
The basics ofHMMs are summarized and several recent related learning algorithms and extensions of HMMs including in particular hybrids of HM Ms with arti cial neural networks are reviewed.
Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition
  • J. Bridle
  • Computer Science
    NATO Neurocomputing
  • 1989
TLDR
Two modifications are explained: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non- linearity of feed-forward non-linear networks with multiple outputs.
...
1
2
3
...