Corpus ID: 17221043

Language Identification from Text Using N-gram Based Cumulative Frequency Addition

@inproceedings{Ahmed2004LanguageIF,
  title={Language Identification from Text Using N-gram Based Cumulative Frequency Addition},
  author={B. Ahmed and S. Cha and C. Tappert},
  year={2004}
}
  • B. Ahmed, S. Cha, C. Tappert
  • Published 2004
  • Computer Science
  • This paper describes the preliminary results of an efficient language classifier using an ad-hoc Cumulative Frequency Addition of N-grams. The new classification technique is simpler than the conventional Naive Bayesian classification method, but it performs similarly in speed overall and better in accuracy on short input strings. The classifier is also 5-10 times faster than N-gram based rank-order statistical classifiers. Language classification using N-gram based rank-order statistics has… CONTINUE READING
    53 Citations
    Comparing Neural Network Approach With N- Gram Approach For Text Categorization
    • 12
    The textcat Package for n-Gram Based Text Categorization in R
    • 59
    • PDF
    Selecting and Weighting N-Grams to Identify 1100 Languages
    • 36
    • PDF
    Index-based n-gram extraction from large document collections
    • 9
    Language identification in texts
    • 2
    • Highly Influenced
    Text-based language identification for the South African languages
    • 21
    Language Detection Engine for Multilingual Texting on Mobile Devices
    • 1
    • PDF
    Automatic Language Identification in Texts: A Survey
    • 77
    • PDF

    References

    SHOWING 1-10 OF 15 REFERENCES
    N-gram-based text categorization
    • 1,693
    • Highly Influential
    • PDF
    Multilingual Sentence Categorization according to Language
    • 17
    • Highly Influential
    • PDF
    High-quality text-to-speech synthesis : an overview
    • 88
    • PDF
    Multilingual text analysis for text-to-speech synthesis
    • 87
    • PDF
    Mixed-lingual text analysis for polyglot TTS synthesis
    • 35
    • Highly Influential
    • PDF
    From multilingual to polyglot speech synthesis
    • 76
    • Highly Influential
    • PDF
    N-gram based Text Categorization, Symposium on Document Analysis and Information Retrieval
    • N-gram based Text Categorization, Symposium on Document Analysis and Information Retrieval
    • 1994
    Statistical Identification of Languages
    • Statistical Identification of Languages
    • 1994
    Artificial Speech: Two centuries of tinkering finally produce a sweettalking machine
    • DISCOVER Vol
    • 2003
    The Mathematics of . . . Artificial Speech: Two centuries of tinkering finally produce a sweettalking machine
    • The Mathematics of . . . Artificial Speech: Two centuries of tinkering finally produce a sweettalking machine
    • 2003