Language Identification from Text Using N-gram Based Cumulative Frequency Addition
@inproceedings{Ahmed2004LanguageIF, title={Language Identification from Text Using N-gram Based Cumulative Frequency Addition}, author={B. Ahmed and S. Cha and C. Tappert}, year={2004} }
This paper describes the preliminary results of an efficient language classifier using an ad-hoc Cumulative Frequency Addition of N-grams. The new classification technique is simpler than the conventional Naive Bayesian classification method, but it performs similarly in speed overall and better in accuracy on short input strings. The classifier is also 5-10 times faster than N-gram based rank-order statistical classifiers. Language classification using N-gram based rank-order statistics has… CONTINUE READING
Figures, Tables, and Topics from this paper
53 Citations
Comparing Neural Network Approach With N- Gram Approach For Text Categorization
- Computer Science
- 2010
- 12
Index-based n-gram extraction from large document collections
- Computer Science
- 2011 Sixth International Conference on Digital Information Management
- 2011
- 9
Text Based Language Identification System for Indian Languages Following Devanagiri Script
- Computer Science
- 2014
- 12
Language Detection Engine for Multilingual Texting on Mobile Devices
- Computer Science
- 2020 IEEE 14th International Conference on Semantic Computing (ICSC)
- 2020
- 1
- PDF
Automatic Language Identification in Texts: A Survey
- Computer Science, Mathematics
- J. Artif. Intell. Res.
- 2019
- 77
- PDF
References
SHOWING 1-10 OF 15 REFERENCES
Multilingual Sentence Categorization according to Language
- Computer Science
- ArXiv
- 1995
- 17
- Highly Influential
- PDF
Mixed-lingual text analysis for polyglot TTS synthesis
- Computer Science
- INTERSPEECH
- 2003
- 35
- Highly Influential
- PDF
N-gram based Text Categorization, Symposium on Document Analysis and Information Retrieval
- N-gram based Text Categorization, Symposium on Document Analysis and Information Retrieval
- 1994
Statistical Identification of Languages
- Statistical Identification of Languages
- 1994
Artificial Speech: Two centuries of tinkering finally produce a sweettalking machine
- DISCOVER Vol
- 2003
The Mathematics of . . . Artificial Speech: Two centuries of tinkering finally produce a sweettalking machine
- The Mathematics of . . . Artificial Speech: Two centuries of tinkering finally produce a sweettalking machine
- 2003