• Corpus ID: 246411701

Star Temporal Classification: Sequence Classification with Partially Labeled Data

  title={Star Temporal Classification: Sequence Classification with Partially Labeled Data},
  author={Vineel Pratap and Awni Y. Hannun and Gabriel Synnaeve and Ronan Collobert},
We develop an algorithm which can learn from partially labeled and unsegmented sequential data. Most sequential loss functions, such as Connectionist Temporal Classification (CTC), break down when many labels are missing. We ad-dress this problem with Star Temporal Classification (STC) which uses a special star token to allow alignments which include all possible tokens whenever a token could be missing. We express STC as the composition of weighted finite-state transducers (WFSTs) and use GTN (a… 
2 Citations

Figures and Tables from this paper

Multi-blank Transducers for Speech Recognition

It is shown that multi-blank RNN-T methods could bring relative speedups of over +90%/+139% to model inference for English Librispeech and German Multilingual LibrisPEech datasets, respectively and improves ASR accuracy consistently.

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

The Bayes risk CTC (BRCTC) criterion is proposed in this work, in which a customizable Bayesrisk function is adopted to enforce the desired characteristics of the predicted alignment and an improved performance-latency trade-off for online models is obtained.



Multi-label Connectionist Temporal Classification

This work presents a novel Multi-label Connectionist Temporal Classification (MCTC) loss function for multi-label, sequence-to-sequence classification and achieves state-of-the-art results on Joint Handwritten Text Recognition and Name Entity recognition, Asian Character Recognition, and OMR.

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.

Semi-Supervised Speech Recognition Via Graph-Based Temporal Classification

Results show that this approach can effectively exploit an N- best list of pseudo-labels with associated scores, considerably outperforming standard pseudo-labeling, with ASR results approaching an oracle experiment in which the best hypotheses of the N-best lists are selected manually.

Temporal classification: extending the classification paradigm to multivariate time series

A temporal learner capable of producing comprehensible and accurate classifiers for multivariate time series that can learn from a small number of instances and can integrate non-temporal features, and a feature construction technique that parameterises sub-events of the training set and clusters them to construct features for a propositional learner.

Connectionist Temporal Modeling for Weakly Supervised Action Labeling

The Extended Connectionist Temporal Classification (ECTC) framework is introduced to efficiently evaluate all possible alignments via dynamic programming and explicitly enforce their consistency with frame-to-frame visual similarities.

Differentiable Weighted Finite-State Transducers

A framework for automatic differentiation with weighted finite-state transducers (WFSTs) allowing them to be used dynamically at training time and a convolutional WFST layer which maps lower-level representations to higher- level representations and can be used as a drop-in replacement for a traditional convolution.

Word Order does not Matter for Speech Recognition

A word-level acoustic model which aggregates the distribution of all output frames using LogSumExp operation and uses a cross-entropy loss to match with the ground-truth words distribution and achieves 2.3%/4.6% on test-clean/test-other subsets of LibriSpeech, which closely matches with the supervised baseline's performance.

Learning from Partial Labels

This work proposes a convex learning formulation based on minimization of a loss function appropriate for the partial label setting, and analyzes the conditions under which this loss function is asymptotically consistent, as well as its generalization and transductive performance.

End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures

This work studies pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions, and reaches a new state-of-the-art for end-to-end acoustic models decoded with an external language model in the standard supervised learning setting.