Punctuation Prediction Model for Conversational Speech

@inproceedings{elasko2018PunctuationPM,
  title={Punctuation Prediction Model for Conversational Speech},
  author={Piotr Żelasko and Piotr Szymański and Jan Mizgajski and Adrian Szymczak and Yishay Carmiel and Najim Dehak},
  booktitle={INTERSPEECH},
  year={2018}
}
An ASR system usually does not predict any punctuation or capitalization. [] Key Method The models are trained on the Fisher corpus which includes punctuation annotation. In our experiments, we combine time-aligned and punctuated Fisher corpus transcripts using a sequence alignment algorithm. The neural networks are trained on Common Web Crawl GloVe embedding of the words in Fisher transcripts aligned with conversation side indicators and word time infomation. The CNNs yield a better precision and BLSTMs…

Figures and Tables from this paper

Joint Prediction of Truecasing and Punctuation for Conversational Speech in Low-Resource Scenarios
TLDR
This work proposes to use a multi-task system that can exploit the relations between casing and punctuation to improve their prediction performance, and shows that by training the model in the written text domain and then transfer learning to conversations, it can achieve reasonable performance with less data.
Discriminative Self-training for Punctuation Prediction
TLDR
A Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts for punctuation prediction accuracy over strong baselines including BERT, RoBERTa, and ELECTRA models is proposed.
Word-level BERT-CNN-RNN Model for Chinese Punctuation Restoration
TLDR
A hybrid model combining the kernel of Bidirectional Encoder Representations from Transformers, Convolution Neural Network, convolution neural network and Recurrent Neural Network is proposed which can extract word-level features for Chinese language.
Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging
  • B. Nguyen, V. H. Nguyen, Luong Chi Mai
  • Computer Science
    2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)
  • 2019
TLDR
A method to restore the punctuation and capitalization for long-speech ASR transcription is proposed based on Transformer models and chunk merging that outperforms existing methods in both accuracy and decoding speed.
Improving Punctuation Restoration for Speech Transcripts via External Data
TLDR
This paper introduces a data sampling technique based on an n-gram language model to sample more training data that are similar to in-domain data and proposes a two-stage fine-tuning approach that utilizes the sampled external data as well as the authors' in- domain dataset for models based on BERT.
Transfer Learning for Punctuation Prediction
TLDR
This work treats the punctuation prediction task as a sequence tagging task and proposes an architecture that uses pre-trained BERT embeddings that significantly improves the state of art on the IWSLT dataset.
Multimodal Punctuation Prediction with Contextual Dropout
TLDR
A transformer-based approach for punctuation prediction that achieves 8% improvement on the IWSLT 2012 TED Task, beating the previous state of the art and an approach to learning a model using contextual dropout that allows to handle variable amounts of future context at test time.
Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?
TLDR
This work shows how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors, and proposes a method for better alignment of homonym embedDings and the validation of the presented method on the punctuation prediction task.
Self-attention Based Model for Punctuation Prediction Using Word and Speech Embeddings
  • Jiangyan Yi, J. Tao
  • Computer Science
    ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2019
TLDR
The results show that the self-attention based model trained using word and speech embedding features outperforms the previous state-of-the-art single model by up to 7.8% absolute overall F1-score.
Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
TLDR
A multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data by performing ablation study on various sizes of the corpus is explored.
...
...

References

SHOWING 1-10 OF 22 REFERENCES
LSTM for punctuation restoration in speech transcripts
TLDR
This work presents a two-stage recurrent neural network based model using long short-term memory units to restore punctuation in speech transcripts, reducing the number of punctuation errors and having largest improvements in period restoration.
Better Punctuation Prediction with Dynamic Conditional Random Fields
TLDR
Empirical results show that the proposed approach is designed to jointly perform both sentence boundary and sentence type prediction, and punctuation prediction on speech utterances, outperforms an approach based on linear-chain conditional random fields and other previous approaches.
Improved models for automatic punctuation prediction for spoken and written text
TLDR
Improved models for the automatic prediction of punctuation marks in written or spoken text, using Conditional Random Fields, outperform a hidden-event language model by up to 26% relative in F-score.
Maximum entropy model for punctuation annotation from speech
TLDR
A maximum-entropy based method for annotating spontaneous conversational speech with punctuation to make automatic transcriptions more readable by humans, and to render them into a form that is useful for subsequent natural language processing and discourse analysis.
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional
Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration
TLDR
A bidirectional recurrent neural network model with attention mechanism for punctuation restoration in unsegmented text enabling it to outperform previous state-of-the-art on English and Estonian datasets by a large margin.
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
TLDR
A method to perform sequencediscriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training is described, using the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI.
Punctuation annotation using statistical prosody models.
TLDR
A statistical finite state model that combines prosodic, linguistic and punctuation class features to generate linguistic meta-data for spoken language is presented.
Automatic linguistic segmentation of conversational speech
  • A. Stolcke, Elizabeth Shriberg
  • Linguistics, Computer Science
    Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96
  • 1996
TLDR
A simple automatic segmenter of transcripts based on N-gram language modeling that achieves 85% recall and 70% precision on linguistic boundary detection and study the relevance of several word-level features for segmentation performance.
GloVe: Global Vectors for Word Representation
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
...
...