Skip to search form
Skip to main content
Skip to account menu
Semantic Scholar
Semantic Scholar's Logo
Search 205,686,765 papers from all fields of science
Search
Sign In
Create Free Account
Text segmentation
Known as:
Chinese word segmentation
, Word segmentation
, Word splitting
Expand
Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental…
Expand
Wikipedia
Create Alert
Alert
Related topics
Related topics
15 relations
Cluster analysis
Delimiter
Document classification
Hidden Markov model
Expand
Papers overview
Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2016
Highly Cited
2016
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
,
B. Haddow
,
Alexandra Birch
ACL
2016
Corpus ID: 1114678
Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem…
Expand
Highly Cited
2008
Highly Cited
2008
A Hybrid Approach to Word Segmentation of Vietnamese Texts
Hong Phuong Le
,
Thi Minh Huyen Nguyen
,
A. Roussanaly
,
H. T. Vinh
LATA
2008
Corpus ID: 15784797
We present in this article a hybrid approach to automatically tokenize Vietnamese text. The approach combines both finite-state…
Expand
Highly Cited
2006
Highly Cited
2006
Contextual Dependencies in Unsupervised Word Segmentation
S. Goldwater
,
T. Griffiths
,
Mark Johnson
ACL
2006
Corpus ID: 907916
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages…
Expand
Highly Cited
2004
Highly Cited
2004
Statistical Models for Text Segmentation
Doug Beeferman
,
A. Berger
,
J. Lafferty
Machine Learning
2004
Corpus ID: 2839111
This paper introduces a new statistical approach to automatically partitioning text into coherent segments. The approach is based…
Expand
Highly Cited
2003
Highly Cited
2003
Discourse Segmentation of Multi-Party Conversation
Michel Galley
,
K. McKeown
,
E. Fosler-Lussier
,
Hongyan Jing
ACL
2003
Corpus ID: 5509911
We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines…
Expand
Highly Cited
2002
Highly Cited
2002
A Critique and Improvement of an Evaluation Metric for Text Segmentation
L. Pevzner
,
Marti A. Hearst
Computational Linguistics
2002
Corpus ID: 6048999
The Pk evaluation metric, initially proposed by Beeferman, Berger, and Lafferty (1997), is becoming the standard measure for…
Expand
Highly Cited
2001
Highly Cited
2001
A Statistical Model for Domain-Independent Text Segmentation
M. Utiyama
,
H. Isahara
ACL
2001
Corpus ID: 10014954
We propose a statistical method that finds the maximum-probability segmentation of a given text. This method does not require…
Expand
Highly Cited
2000
Highly Cited
2000
Advances in domain independent linear text segmentation
Freddy Y. Y. Choi
ANLP
2000
Corpus ID: 2958363
This paper describes a method for linear text segmentation which is twice as accurate and over seven times as fast as the state…
Expand
Highly Cited
2000
Highly Cited
2000
Maximum Entropy Markov Models for Information Extraction and Segmentation
A. McCallum
,
Dayne Freitag
,
Fernando C Pereira
ICML
2000
Corpus ID: 775373
Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to…
Expand
Highly Cited
1999
Highly Cited
1999
Using Maximum Entropy for Text Classification
K. Nigam
,
J. Lafferty
,
A. McCallum
1999
Corpus ID: 574041
This paper proposes the use of maximum entropy techniques for text classification. Maximum entropy is a probability distribution…
Expand
By clicking accept or continuing to use the site, you agree to the terms outlined in our
Privacy Policy
,
Terms of Service
, and
Dataset License
ACCEPT & CONTINUE