Skip to search form
Skip to main content
Skip to account menu
Semantic Scholar
Semantic Scholar's Logo
Search 225,000,445 papers from all fields of science
Search
Sign In
Create Free Account
Text segmentation
Known as:
Chinese word segmentation
, Word segmentation
, Word splitting
Expand
Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental…
Expand
Wikipedia
(opens in a new tab)
Create Alert
Alert
Related topics
Related topics
15 relations
Cluster analysis
Delimiter
Document classification
Hidden Markov model
Expand
Papers overview
Semantic Scholar uses AI to extract papers important to this topic.
Highly Cited
2007
Highly Cited
2007
Topic segmentation with shared topic detection and alignment of multiple documents
Bingjun Sun
,
P. Mitra
,
C. Lee Giles
,
J. Yen
,
H. Zha
Annual International ACM SIGIR Conference on…
2007
Corpus ID: 17839569
Topic detection and tracking and topic segmentation play an important role in capturing the local and sequential information of…
Expand
2007
2007
Thoughts on Word and Sentence Segmentation in Thai
Wirote Aroonmanakun
2007
Corpus ID: 1048077
This paper discusses problems of word and sentence segmentation in Thai. Disagreements on word segmentation are caused mostly…
Expand
2006
2006
Semi-automatic Ground Truth Generation for Chart Image Recognition
Li Yang
,
Weihua Huang
,
C. Tan
International Workshop on Document Analysis…
2006
Corpus ID: 15280678
While research on scientific chart recognition is being carried out, there is no suitable standard that can be used to evaluate…
Expand
Highly Cited
2005
Highly Cited
2005
Camera-based Kanji OCR for mobile-phones: practical issues
Masashi Koga
,
Ryuji Mine
,
Tatsuya Kameyama
,
Toshikazu Takahashi
,
Masahiro Yamazaki
,
Teruyuki Yamaguchi
IEEE International Conference on Document…
2005
Corpus ID: 23479030
A camera based optical character reader (OCR) for Japanese Kanji characters was implemented on a mobile phone. This OCR has three…
Expand
Highly Cited
2001
Highly Cited
2001
Report on CLEF-2001 Experiments: Effective Combined Query-Translation Approach
J. Savoy
Conference and Labs of the Evaluation Forum
2001
Corpus ID: 9943803
In our first participation in clef retrieval tasks, the primary objective was to define a general stopword list for various…
Expand
Highly Cited
2001
Highly Cited
2001
Text analysis using local energy
Woei Chan
,
G. Coghill
Pattern Recognition
2001
Corpus ID: 14593040
Highly Cited
2000
Highly Cited
2000
Compound splitting and lexical unit recombination for improved performance of a speech recognition system for German parliamentary speeches
M. Larson
,
D. Willett
,
J. Köhler
,
G. Rigoll
Interspeech
2000
Corpus ID: 1788721
This paper proposes a novel combined compound splitting and phrase recombination method that optimizes the composition of the…
Expand
Highly Cited
1998
Highly Cited
1998
Text identification for document image analysis using a neural network
C. Strouthopoulos
,
N. Papamarkos
Image and Vision Computing
1998
Corpus ID: 1755656
Highly Cited
1997
Highly Cited
1997
Automatic separation of words in multi-lingual multi-script Indian documents
U. Pal
,
B. B. Chaudhuri
Proceedings of the Fourth International…
1997
Corpus ID: 7753713
In a multi-lingual country like India, a document may contain more than one script forms. For such a document it is necessary to…
Expand
Highly Cited
1993
Highly Cited
1993
Segmentation of Fluent Speech into Words: Learning Models and the Role of Maternal Input
R. Aslin
1993
Corpus ID: 62503223
Two research strategies aimed at understanding how maternal speech input enables pre-productive infants to segment words from…
Expand
By clicking accept or continuing to use the site, you agree to the terms outlined in our
Privacy Policy
(opens in a new tab)
,
Terms of Service
(opens in a new tab)
, and
Dataset License
(opens in a new tab)
ACCEPT & CONTINUE