Spelling Checker-based Language Identification for the Eleven Official South African Languages

@inproceedings{Pienaar2010SpellingCL,
  title={Spelling Checker-based Language Identification for the Eleven Official South African Languages},
  author={Wikus Pienaar and Dirk Snyman},
  year={2010}
}
Language identification is often the first step when compiling corpora from web pages or other unstructured sources. In this paper, an effective and accurate method for identification of all eleven official South African languages is presented. The method is based on reusing commercial spelling checkers and consists of a multi-stage architecture that is described in detail. We describe the implementation of our method, as well as an optimisation technique that was applied to reduce the… CONTINUE READING
4 Citations
10 References
Similar Papers

References

Publications referenced by this paper.
Showing 1-10 of 10 references

Textcat. http://odur.let.rug.nl/vannoord/TextCat/. [Date of use: 2010-10-04

  • G. Van Noord
  • 1997
Highly Influential
3 Excerpts

Lingua::Identify – Language Identification. http://search.cpan.org/~ambs/Lingua- Identify/lib/Lingua/Identify.pm

  • J. Castro, A. Simoes
  • [Date of use:
  • 2010
2 Excerpts

Press Release : Spelling checkers for South African Languages

  • M. H. Muller
  • 2010

Similar Papers

Loading similar papers…