Noisy BiLSTM-Based Models for Disfluency Detection

@inproceedings{Bach2019NoisyBM,
  title={Noisy BiLSTM-Based Models for Disfluency Detection},
  author={Nguyen Bach and Fei Huang},
  booktitle={INTERSPEECH},
  year={2019}
}
This paper describes BiLSTM-based models to disfluency detection in speech transcripts using residual BiLSTM blocks, self-attention, and noisy training approach. Our best model not only surpasses BERT in 4 non-Switchboard test sets, but also is 20 times smaller than the BERT-based model [1]. Thus, we demonstrate that strong performance can be achieved without extensively use of very large training data. In addition, we show that it is possible to be robust across data sets with noisy training… 

Figures and Tables from this paper

Disfluency Detection with Unlabeled Data and Small BERT Models
TLDR
It is demonstrated it is possible to train disfluency detection models as small as 1.3 MiB, while retaining high performance, and the effect of domain mismatch between conversational and written text on model performance is evaluated.
Span Classification with Structured Information for Disfluency Detection in Spoken Utterances
TLDR
This paper proposes a novel architecture for detecting disfluencies in transcripts from spoken utterances, incorporating both contextual information through transformers and long-distance structured information captured by dependency trees, through graph convolutional networks (GCNs).
Combining Self-supervised Learning and Active Learning for Disfluency Detection
TLDR
This work investigates methods for combining self-supervised learning and active learning for disfluency detection and shows that the combined model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.
Improved Robustness to Disfluencies in Rnn-Transducer Based Speech Recognition
TLDR
This work investigates data selection and preparation choices aiming for improved robustness of RNN-T ASR to speech disfluencies with a focus on partial words and shows that after including a small amount of data with dis fluencies in the training set the recognition accuracy on the tests with disfluency and stuttering improves.
Disfluency Correction using Unsupervised and Semi-supervised Learning
TLDR
A disfluency correction model that translates disfluent to fluent text by drawing inspiration from recent encoder-decoder unsupervised style-transfer models for text is introduced.
Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection
TLDR
This work explores the unsupervised learning paradigm which can potentially work with unlabeled text corpora that are cheaper and easier to obtain, and achieves competitive performance compared to the previous state-of-the-art supervised systems using contextualized word embeddings.
Improving Disfluency Detection by Self-Training a Self-Attentive Model
TLDR
It is shown that self-training — a semi-supervised technique for incorporating unlabeled data — sets a new state-of-the-art for the self-attentive parser on disfluency detection, and that ensembling self-trained parsers provides further gains for dis fluency detection.
Joint Prediction of Punctuation and Disfluency in Speech Transcripts
TLDR
This work proposes an attention-based structure in the task-specific layers of the MTL framework incorporating the pretrained BERT (a state-of-art NLP-related model) and results show the proposed architecture outperforms the separate modeling methods as well as the traditional MTL methods.
Improving Distinction between ASR Errors and Speech Disfluencies with Feature Space Interpolation
TLDR
This paper proposes a scheme to improve existing LM-based ASR error detection systems, both in terms of detection scores and resilience to such distracting auxiliary tasks, and adopts the popular mixup method in text feature space and can be utilized with any black-box ASR output.
Semantic Parsing of Disfluent Speech
TLDR
It is found that a state-of-the-art semantic parser does not seamlessly handle disfluency, and adding synthetic disfluencies not only improves model performance by up to 39% but can also outperform adding real dis fluencies in the ATIS dataset.
...
1
2
...

References

SHOWING 1-10 OF 32 REFERENCES
Disfluency Detection Using a Bidirectional LSTM
TLDR
A new approach for disfluency detection using a Bidirectional Long-Short Term Memory neural network (BLSTM), which takes as input pattern match features that were developed to reduce sensitivity to vocabulary size in training, which lead to improved performance over the word sequence alone.
Disfluency Detection with a Semi-Markov Model and Prosodic Features
We present a discriminative model for detecting disfluencies in spoken language transcripts. Structurally, our model is a semiMarkov conditional random field with features targeting characteristics
Transition-Based Disfluency Detection using LSTMs
TLDR
This model incrementally constructs and labels the disfluency chunk of input sentences using a new transition system without syntax information, which can capture non-local chunk-level features and is free for noise in syntax.
The impact of language models and loss functions on repair disfluency detection
TLDR
It is shown that language models trained on large amounts of non-speech data improve performance more than a language model trained on a more modest amount of speech data, and that optimising f-score rather than log loss improves disfluency detection performance.
Efficient Disfluency Detection with Transition-based Parsing
TLDR
An efficient disfluency detection approach based on right-to-left transitionbased parsing is proposed, which can efficiently identify disfluencies and keep ASR outputs grammatical.
Multilingual Disfluency Removal using NMT
TLDR
It is suggested that learning a joint representation of the disfluencies in multiple languages can be a promising solution to the data sparsity issue.
Multi-domain disfluency and repair detection
TLDR
The work shows that a simple CRF-based model is effective for cross-domain training, which is important for contexts where annotated data is not available, and incorporates an expanded state space for recognizing the repair structure, unlike prior work that annotates only the reparandum.
Automatic disfluency identification in conversational speech using multiple knowledge sources
TLDR
This work investigates a number of knowledge sources for disfluency detection, including acoustic-prosodic features, a language model to account for repetition patterns, a part-of-speech (POS) based LM, and rule-based knowledge, to show that detection of disfluencies interruption points is best achieved by a combination of prosodic cues, word-based cues, and POS- based cues.
A phrase-level machine translation approach for disfluency detection using weighted finite state transducers
TLDR
A novel algorithm to detect disfluency in speech is proposed by reformulating the problem as phrase-level statistical machine translation using weighted finite state transducers and simplifying the translation framework such that it does not require fertility and alignment models.
An Improved Model for Recognizing Disfluencies in Conversational Speech
TLDR
A novel metadata extraction (MDE) system for automatically detecting edited words, fillers, and self-interruption points in conversational speech that has improved the state-of-the-art, as measured in a recent blind evaluation.
...
1
2
3
4
...