Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

@inproceedings{Augustyniak2020PunctuationPI,
  title={Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?},
  author={Lukasz Augustyniak and Piotr Szymański and Mikolaj Morzy and Piotr Żelasko and Adrian Szymczak and Jan Mizgajski and Yishay Carmiel and Najim Dehak},
  booktitle={INTERSPEECH},
  year={2020}
}
Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task. These errors usually take the form of homonyms. We show how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors. Our main contribution is a method for better alignment of homonym embeddings and the validation of the presented method on the punctuation prediction task. We record the… 
Joint prediction of truecasing and punctuation for conversational speech in low-resource scenarios
TLDR
This work proposes to use a multitask system that can exploit the relations between casing and punctuation to improve their prediction performance, and shows that by training the model in the written text domain and then transfer learning to conversations, it can achieve reasonable performance with less data.
Capitalization and Punctuation Restoration: a Survey
TLDR
This survey offers an overview of both historical and state-of-the-art techniques for restoring punctuation and correcting word casing, and current challenges and research directions are highlighted.
Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
TLDR
A multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data by performing ablation study on various sizes of the corpus is explored.
Capitalization and Punctuation Restoration: a Survey
  • V. Pais, D. Tufis
  • Computer Science, Business
    Artificial Intelligence Review
  • 2021
Comprehensive Analysis of Aspect Term Extraction Methods using Various Text Embeddings
TLDR
This work proposes a comparison with ablation analysis of aspect term extraction using various text embedding methods and reveals that not only does bi-directional long short-term memory (BiLSTM) outperform regular LSTM, but also word embedding coverage and its source highly affect aspect detection performance.
Deep Contextual Punctuator for NLG Text (short paper)
TLDR
This paper describes the team oneNLP’s participation for the SEPP-NLG 2021 tasks1, Sentence End and Punctuation Prediction in NLG Text-2021 and explored the use of multilingual Bert and multitask learning for these tasks on English, German, French and Italian.

References

SHOWING 1-10 OF 41 REFERENCES
Modeling punctuation prediction as machine translation
TLDR
This paper analyzes different methods for punctuation prediction and shows improvements in the quality of the final translation output and does a system combination of the hypotheses of all the different approaches.
Punctuation Prediction for Unsegmented Transcript Based on Word Vector
TLDR
The proposed approach to predict punctuation marks for unsegmented speech transcript is purely lexical, with pre-trained Word Vectors as the only input, and shows its effectiveness by achieving better result than the state-of-the-art lexical solution which works with same type of data, especially when predicting puncuation position only.
Punctuation Prediction Model for Conversational Speech
TLDR
This work trains two variants of Deep Neural Network sequence labelling models - a Bidirectional Long Short-Term Memory (BLSTM) and a Convolutional Neural Network (CNN), to predict the punctuation.
Question Mark Prediction By Bert
  • Yunqi Cai, D. Wang
  • Computer Science
    2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
  • 2019
TLDR
This paper proposes to solve the problem of discriminating question marks and periods by the self-attention mechanism of the Bert model and demonstrates that compared the best baseline, the new approach improved the F1 score of question mark prediction from 30% to 90%.
Better Punctuation Prediction with Dynamic Conditional Random Fields
TLDR
Empirical results show that the proposed approach is designed to jointly perform both sentence boundary and sentence type prediction, and punctuation prediction on speech utterances, outperforms an approach based on linear-chain conditional random fields and other previous approaches.
Experiments in Character-Level Neural Network Models for Punctuation
TLDR
The accuracy of proposed character-level models are competitive with the accuracy of a state-of-the-art word-level Conditional Random Field (CRF) baseline with carefully crafted features.
LSTM for punctuation restoration in speech transcripts
TLDR
This work presents a two-stage recurrent neural network based model using long short-term memory units to restore punctuation in speech transcripts, reducing the number of punctuation errors and having largest improvements in period restoration.
Maximum entropy model for punctuation annotation from speech
TLDR
A maximum-entropy based method for annotating spontaneous conversational speech with punctuation to make automatic transcriptions more readable by humans, and to render them into a form that is useful for subsequent natural language processing and discourse analysis.
Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
  • B. Nguyen, V. H. Nguyen, +4 authors L. C. Mai
  • Computer Science
    2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)
  • 2019
TLDR
A method to restore the punctuation and capitalization for long-speech ASR transcription is proposed based on Transformer models and chunk merging that outperforms existing methods in both accuracy and decoding speed.
Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech
TLDR
This work proposes a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output and shows that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.
...
1
2
3
4
5
...