Skip to search formSkip to main contentSkip to account menu

Text segmentation

Known as: Chinese word segmentation, Word segmentation, Word splitting 
Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental… 
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2016
Highly Cited
2016
Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem… 
Highly Cited
2008
Highly Cited
2008
We present in this article a hybrid approach to automatically tokenize Vietnamese text. The approach combines both finite-state… 
Highly Cited
2006
Highly Cited
2006
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages… 
Highly Cited
2004
Highly Cited
2004
This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based… 
Highly Cited
2003
Highly Cited
2003
We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines… 
Highly Cited
2002
Highly Cited
2002
The Pk evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for… 
Highly Cited
2001
Highly Cited
2001
We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require… 
Highly Cited
2000
Highly Cited
2000
This paper describes a method for linear text segmentation which is twice as accurate and over seven times as fast as the state… 
Highly Cited
2000
Highly Cited
2000
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to… 
Highly Cited
1999
Highly Cited
1999
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution…