Language Recognition for Creating Corpora

  title={Language Recognition for Creating Corpora},
  author={Yevgeny Ludovik and Ron Zacharski},
In this paper we describe a language recognition algorithm for multilingual documents that is based on mixed-order n-grams, Markov chains, maximum likelihood, and dynamic programming. We present the results of an experimental study that showed that the performance of this algorithm has practical value. 

From This Paper

Topics from this paper.
3 Citations
7 References
Similar Papers


Publications referenced by this paper.
Showing 1-7 of 7 references

" Bigram and trigram models for language identification and character recognition "

  • G. Churcher, J. Hayes, S. Johnson, C. Souter
  • Proceed - ings of the 1994 AISB Workshop on…
  • 1994
2 Excerpts

" Statistical identification of language "

  • T TedDunning
  • Computing Research Laboratory Technical Report…
  • 1994

" An Algorithm for optimal quasi linear compression of speech signals "

  • Ludovik Ye.
  • 1982

An Algorithm for optimal quasi linear compression of speech signals", in Proceedings of "Automatic Speech Recognition-12

  • Ludovik Ye
  • Odessa, USSR. Vintsiuk T.K
  • 1982

" A language identification table "

  • C. IngleN.
  • The incorporated linguist
  • 1976

Optimal splitting of a sequence of elements into subsequences "

  • K. VintsiukT.
  • Cybernetics
  • 1970

Similar Papers

Loading similar papers…