Skip to search formSkip to main content
You are currently offline. Some features of the site may not work correctly.

Text segmentation

Known as: Chinese word segmentation, Word segmentation, Word splitting 
Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental… Expand
Wikipedia

Papers overview

Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2016
Highly Cited
2016
Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem… Expand
  • figure 1
  • table 2
  • table 1
  • table 3
  • table 4
Is this relevant?
Highly Cited
2009
Highly Cited
2009
In this paper, we present a segmentation methodology of handwritten documents in their distinct entities, namely, text lines and… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 7
Is this relevant?
Highly Cited
2007
Highly Cited
2007
Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it… Expand
  • table 1
  • table 2
  • figure 1
  • table 3
  • figure 2
Is this relevant?
Highly Cited
2006
Highly Cited
2006
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages… Expand
  • figure 1
  • figure 2
  • figure 3
  • table 1
  • table 2
Is this relevant?
Highly Cited
2004
Highly Cited
2004
This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based… Expand
Is this relevant?
Highly Cited
2002
Highly Cited
2002
The Pk evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for… Expand
Is this relevant?
Highly Cited
2001
Highly Cited
2001
We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require… Expand
  • figure 1
  • table 2
  • table 3
Is this relevant?
Highly Cited
2000
Highly Cited
2000
This paper describes a method for linear text segmentation which is twice as accurate and over seven times as fast as the state… Expand
  • figure 1
  • figure 2
  • figure 3
  • figure 4
  • figure 5
Is this relevant?
Highly Cited
2000
Highly Cited
2000
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to… Expand
  • figure 1
  • table 1
  • table 2
  • table 4
Is this relevant?
Highly Cited
1999
Highly Cited
1999
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution… Expand
  • table 2
  • figure 1
  • figure 2
  • figure 3
Is this relevant?