• Corpus ID: 235624000

Unsupervised Topic Segmentation of Meetings with BERT Embeddings

@article{Solbiati2021UnsupervisedTS,
  title={Unsupervised Topic Segmentation of Meetings with BERT Embeddings},
  author={Alessandro Solbiati and Kevin Heffernan and Georgios Damaskinos and Shivani Poddar and Shubham Modi and Jacques Cal{\`i}},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.12978}
}
Topic segmentation of meetings is the task of dividing multi-person meeting transcripts into topic blocks. Supervised approaches to the problem have proven intractable due to the difficulties in collecting and accurately annotating large datasets. In this paper we show how previous unsupervised topic segmentation methods can be improved using pre-trained neural architectures. We introduce an unsupervised approach based on BERT embeddings that achieves a 15.5% reduction in error rate over… 

Figures and Tables from this paper

Topic Break Detection in Interview Dialogues Using Sentence Embedding of Utterance and Speech Intention Based on Multitask Neural Networks
TLDR
A method for detecting topic breaks in dialogue to achieve flexible topic switching in interview dialogue systems is proposed based on multi-task learning neural network that uses embedded representations of sentences to understand the context of the text and utilizes the intention of an utterance as a feature.
PREME: Preference-based Meeting Exploration through an Interactive Questionnaire
TLDR
This work proposes a novel end-to-end framework for generating interactive questionnaires for preference-based meeting exploration and introduces an automatic evaluation strategy that measures how much the generated questions via questionnaire are answerable to ensure factual correctness.
When headers are not there: design and user evaluation of an automatic topicalisation and labelling tool to aid the exploration of web documents by blind users
TLDR
The design and evaluation of a tool for automatically generating headers for screen readers with topicalisation and labelling algorithms, which uses Natural Language Processing techniques to divide a web document into topic segments and label each segment based on its content.

References

SHOWING 1-10 OF 36 REFERENCES
Topic segmentation in ASR transcripts using bidirectional RNNS for change detection
TLDR
A novel approach for topic segmentation in speech recognition transcripts by measuring lexical cohesion using bidirectional Recurrent Neural Networks (RNNs) to perform topic change detection.
SECTOR: A Neural Model for Coherent Topic Segmentation and Classification
TLDR
SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section, and reports a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain.
Attention-Based Neural Text Segmentation
TLDR
This paper proposes an attention-based bidirectional LSTM model where sentence embeddings are learned using CNNs and the segments are predicted based on contextual information that can automatically handle variable sized context information.
Discourse Segmentation of Multi-Party Conversation
TLDR
A domain-independent topic segmentation algorithm for multi-party speech that combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech.
SegBot: A Generic Neural Text Segmentation Model with Pointer Network
TLDR
This work proposes a generic end-to-end segmentation model called SegBot, which outperforms state-of-the-art models on both topic and EDU segmentation tasks.
Statistical Models for Text Segmentation
TLDR
Assessment of the approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts, using a new probabilistically motivated error metric.
Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books
TLDR
To align movies and books, a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book are proposed.
Linear Text Segmentation Using Affinity Propagation
TLDR
The results suggest that APS performs on par with or outperforms these two very competitive baselines on topical text segmentation in comparison with two state-of-the art segmenters.
Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages
TLDR
The algorithm is fully implemented and is shown to produce segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts, which should be useful for many text analysis tasks, including information retrieval and summarization.
How Diachronic Text Corpora Affect Context based Retrieval of OOV Proper Names for Audio News
TLDR
It is concluded that a diachronic corpus with text from different sources leads to better retrieval performance than one relying on text from single source or from a longer time span.
...
...