Training and Domain Adaptation for Supervised Text Segmentation
@inproceedings{Glavas2021TrainingAD, title={Training and Domain Adaptation for Supervised Text Segmentation}, author={Goran Glavas and Ananya Ganesh and Swapna Somasundaran}, booktitle={BEA}, year={2021} }
Unlike traditional unsupervised text segmentation methods, recent supervised segmentation models rely on Wikipedia as the source of large-scale segmentation supervision. These models have, however, predominantly been evaluated on the in-domain (Wikipedia-based) test sets, preventing conclusions about their general segmentation efficacy. In this work, we focus on the domain transfer performance of supervised neural text segmentation in the educational domain. To this end, we first introduce…
One Citation
Sustainable Modular Debiasing of Language Models
- Computer ScienceEMNLP
- 2021
An extensive evaluation, encompassing three intrinsic and two extrinsic bias measures, renders A DELE very effective in bias mitigation, and it is shown that – due to its modular nature – ADELE retains fairness even after large-scale downstream training.
References
SHOWING 1-10 OF 34 REFERENCES
Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation
- Computer ScienceAAAI
- 2020
A novel supervised model for text segmentation with simple but explicit coherence modeling that couples the sentence-level segmentation objective with the coherence objective that differentiates correct sequences of sentences from corrupt ones and can successfully segment texts in languages unseen in training.
Text Segmentation as a Supervised Learning Task
- Computer ScienceNAACL
- 2018
This work forms text segmentation as a supervised learning problem, and presents a large new dataset for text segmentations that is automatically extracted and labeled from Wikipedia, and develops a segmentation model that generalizes well to unseen natural text.
Neural Text Segmentation and its Application to Sentiment Analysis
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2022
This work proposes a generic end-to-end segmentation model, namely <inline-formula><tex-math notation="LaTeX]," which first uses a bidirectional recurrent neural network to encode an input text sequence.
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale
- Computer ScienceEMNLP
- 2020
The best zero-shot transfer model considerably outperforms in-domain BERT and the previous state of the art on six benchmarks, and is proposed to incorporate self-supervised with supervised multi-task learning on all available source domains.
Statistical Models for Text Segmentation
- Computer ScienceMachine Learning
- 2004
Assessment of the approach on quantitative and qualitative grounds demonstrates its effectiveness in two very different domains, Wall Street Journal news articles and television broadcast news story transcripts, using a new probabilistically motivated error metric.
Applying Machine Learning to Text Segmentation for Information Retrieval
- Computer ScienceInformation Retrieval
- 2004
It is found that at around 70% word segmentation accuracy an over-segmentation phenomenon begins to occur which leads to a reduction in information retrieval performance, which suggests that words themselves might be too broad a notion to conveniently capture the general semantic meaning of Chinese text.
C-HTS: A Concept-based Hierarchical Text Segmentation approach
- Computer ScienceLREC
- 2018
This paper proposes C-HTS, a Concept-based Hierarchical Text Segmentation approach that uses the semantic relatedness between text constituents, and uses the explicit semantic representation of text, automatically extracted from massive human knowledge repositories such as Wikipedia.
Exploring Influence of Topic Segmentation on Information Retrieval Quality
- Computer ScienceINSCI
- 2018
A search pipeline based on text segmentation by means of BigARTM tool and TopicTiling algorithm is proposed, which allows one to better model text structure and therefore language itself, which influences the quality of text representation.
TopicTiling: A Text Segmentation Algorithm based on LDA
- Computer ScienceACL 2012
- 2012
This work presents a Text Segmentation algorithm called TopicTiling, which is based on the well-known TextTiling algorithm, and segments documents using the Latent Dirichlet Allocation topic model, and is computationally less expensive than other LDA-based segmentation methods.
Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion
- Computer ScienceNAACL
- 2009
This paper presents a novel unsupervised method for hierarchical topic segmentation that takes the form of a coordinate-ascent algorithm, iterating between two steps: a novel dynamic program for obtaining the globally-optimal hierarchical segmentation, and collapsed variational Bayesian inference over the hidden variables.