Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

  title={Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks},
  author={Alex Graves and Santiago Fern{\'a}ndez and Faustino J. Gomez and J{\"u}rgen Schmidhuber},
  journal={Proceedings of the 23rd international conference on Machine learning},
Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. [] Key Result An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

Figures from this paper

Supervised Sequence Labelling with Recurrent Neural Networks

  • A. Graves
  • Computer Science
    Studies in Computational Intelligence
  • 2008
A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

Temporal Attention-Gated Model for Robust Sequence Classification

The Temporal Attention-Gated Model (TAGM) is presented which integrates ideas from attention models and gated recurrent networks to better deal with noisy or unsegmented sequences.

Modeling intra-label dynamics in connectionist temporal classification

This paper proposes to model each label with a sequence of hidden sub-labels at the CTC level, which allows CTC to learn the intra-label relations which transfers part of the load of learning dynamical sequences from RNN to CTC.

Parallel sequence classification using recurrent neural networks and alignment

A new model called class-less classifier is proposed, which is cognitive motivated by a simplified version of the infants learning, which not only learns the semantic association but also learns the relation between the labels and the classes.

Probabilistic asr feature extraction applying context-sensitive connectionist temporal classification networks

In challenging ASR scenarios involving highly spontaneous, disfluent, and noisy speech, the BN-CTC front-end leads to remarkable word accuracy improvements and prevails over a series of previously introduced BLSTM-based ASR systems.

Sequence-discriminative training of recurrent neural networks

It is shown that although recurrent neural networks already make use of the whole observation sequence and are able to incorporate more contextual information than feed forward networks, their performance can be improved with sequence-discriminative training.

Learning acoustic frame labeling for speech recognition with recurrent neural networks

  • H. SakA. Senior J. Schalkwyk
  • Computer Science, Physics
    2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2015
It is shown that a bidirectional LSTM RNN CTC model using phone units can perform as well as an LSTm RNN model trained with CE using HMM state alignments, and the effect of sequence discriminative training on these models is shown.

Segmental Recurrent Neural Networks

Experiments on handwriting recognition and joint Chinese word segmentation/POS tagging show that segmental recurrent neural networks obtain substantially higher accuracies compared to models that do not explicitly represent segments.

End-to-End Sequence Labeling via Convolutional Recurrent Neural Network with a Connectionist Temporal Classification Layer

A kind of novel deep neural network architecture which combines convolution, pooling and recurrent in a unified framework to construct the convolutional recurrent neural network (CRNN) for sequence labeling tasks with variable lengths of input and output is proposed.

Online Sequence Training of Recurrent Neural Networks with Connectionist Temporal Classification

An expectation-maximization (EM) based online CTC algorithm is introduced that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling and can also be trained to process an infinitely long input sequence without pre-segmentation or external reset.



Bidirectional recurrent neural networks

It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.

Temporal classification: extending the classification paradigm to multivariate time series

A temporal learner capable of producing comprehensible and accurate classifiers for multivariate time series that can learn from a small number of instances and can integrate non-temporal features, and a feature construction technique that parameterises sub-events of the training set and clusters them to construct features for a propositional learner.

Connectionist Speech Recognition: A Hybrid Approach

From the Publisher: Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous

Neural Networks: Tricks of the Trade

It is shown how nonlinear semi-supervised embedding algorithms popular for use with â œshallowâ learning techniques such as kernel methods can be easily applied to deep multi-layer architectures.

Long Short-Term Memory

A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

Markovian Models for Sequential Data

The basics ofHMMs are summarized and several recent related learning algorithms and extensions of HMMs including in particular hybrids of HM Ms with arti cial neural networks are reviewed.

Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition

  • J. Bridle
  • Computer Science
    NATO Neurocomputing
  • 1989
Two modifications are explained: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non- linearity of feed-forward non-linear networks with multiple outputs.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

An application of recurrent nets to phone probability estimation

Recognition results are presented for the DARPA TIMIT and Resource Management tasks, and it is concluded that recurrent nets are competitive with traditional means for performing phone probability estimation.