• Corpus ID: 235624000

Unsupervised Topic Segmentation of Meetings with BERT Embeddings

  title={Unsupervised Topic Segmentation of Meetings with BERT Embeddings},
  author={Alessandro Solbiati and Kevin Heffernan and Georgios Damaskinos and Shivani Poddar and Shubham Modi and Jacques Cal{\`i}},
Topic segmentation of meetings is the task of dividing multi-person meeting transcripts into topic blocks. Supervised approaches to the problem have proven intractable due to the difficulties in collecting and accurately annotating large datasets. In this paper we show how previous unsupervised topic segmentation methods can be improved using pre-trained neural architectures. We introduce an unsupervised approach based on BERT embeddings that achieves a 15.5% reduction in error rate over… 

Figures and Tables from this paper

Improving Topic Segmentation by Injecting Discourse Dependencies

The empirical study on English evaluation datasets shows that injecting above-sentence discourse structures to a neural topic segmenter with the proposed strategy can substantially improve its performances on intra-domain and out-of-domain data, with little increase of model’s complexity.

Topic Break Detection in Interview Dialogues Using Sentence Embedding of Utterance and Speech Intention Based on Multitask Neural Networks

A method for detecting topic breaks in dialogue to achieve flexible topic switching in interview dialogue systems is proposed based on multi-task learning neural network that uses embedded representations of sentences to understand the context of the text and utilizes the intention of an utterance as a feature.

PREME: Preference-based Meeting Exploration through an Interactive Questionnaire

This work proposes a novel end-to-end framework for generating interactive questionnaires for preference-based meeting exploration and introduces an automatic evaluation strategy that measures how much the generated questions via questionnaire are answerable to ensure factual correctness.

When headers are not there: design and user evaluation of an automatic topicalisation and labelling tool to aid the exploration of web documents by blind users

The design and evaluation of a tool for automatically generating headers for screen readers with topicalisation and labelling algorithms, which uses Natural Language Processing techniques to divide a web document into topic segments and label each segment based on its content.

Structured Summarization: Unified Text Segmentation and Segment Labeling as a Generation Task

Text segmentation aims to divide text into contiguous, semantically coherent segments, while segment labeling deals with producing labels for each segment. Past work has shown success in tackling



Topic segmentation in ASR transcripts using bidirectional RNNS for change detection

A novel approach for topic segmentation in speech recognition transcripts by measuring lexical cohesion using bidirectional Recurrent Neural Networks (RNNs) to perform topic change detection.

SECTOR: A Neural Model for Coherent Topic Segmentation and Classification

SECTOR, a model to support machine reading systems by segmenting documents into coherent sections and assigning topic labels to each section, and reports a highest score of 71.6% F1 for the segmentation and classification of 30 topics from the English city domain.

A Joint Model for Document Segmentation and Segment Labeling

This work introduces Segment Pooling LSTM (S-LSTM), which is capable of jointly segmenting a document and labeling segments, and develops a method for teaching the model to recover from errors by aligning the predicted and ground truth segments.

Topic Segmentation for Dialogue Stream

  • Leilan ZhangQiang Zhou
  • Computer Science
    2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
  • 2019
This paper formulate topic segmentation as a sequence labeling task and proposes a model based on BERT and TCN (Temporal Convolutional Network) to accomplish the task, which shows an absolute performance improvement of 8% – 17% in F1scores.

A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining

A novel abstractive summary network that adapts to the meeting scenario is proposed with a hierarchical structure to accommodate long meeting transcripts and a role vector to depict the difference among speakers.

Attention-Based Neural Text Segmentation

This paper proposes an attention-based bidirectional LSTM model where sentence embeddings are learned using CNNs and the segments are predicted based on contextual information that can automatically handle variable sized context information.

Discourse Segmentation of Multi-Party Conversation

A domain-independent topic segmentation algorithm for multi-party speech that combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech.

An Analysis of Quantitative Aspects in the Evaluation of Thematic Segmentation Algorithms

It is shown that evaluation on synthetic data is potentially misleading and fails to give an accurate evaluation of the performance on real data, and a critical review of existing evaluation metrics in the literature and an improved evaluation metric are provided.

Statistical Models for Text Segmentation

Assessment of the approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts, using a new probabilistically motivated error metric.

Meeting Structure Annotation

We describe a generic set of tools for representing, annotating, and analysing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and