Corpus ID: 4337447

Spelling Correction: from Two-Level Morphology to Open Source

  title={Spelling Correction: from Two-Level Morphology to Open Source},
  author={I. Alegria and K. Ceberio and N. Ezeiza and Aitor Soroa Etxabe and Gregorio Hern{\'a}ndez},
Basque is a highly inflected and agglutinative language (Alegria et al., 1996). Two-level morphology has been applied successfully to this kind of languages and there are two-level based descriptions for very different languages. After doing the morphological description for a language, it is easy to develop a spelling checker/corrector for this language. However, what happens if we want to use the speller in the “free world” (OpenOffice, Mozilla, emacs, LaTeX, etc.)? Ispell and similar tools… Expand
Porting Basque Morphological Grammars to foma, an Open-Source Tool
The process of porting a previous description of Basque morphology to foma, an open-source finite-state toolkit compatible with Xerox tools, is described, and the development of a twolevel grammar with parallel alternation rules and a sequential grammar developed by composing individual replacement rules is compared. Expand
Spelling Correction for Kazakh
This paper describes a spelling correction method for Kazakh that takes advantage of both morphological analysis and noisy channel-based model and outperforms both open source and commercial analogues in terms of the overall accuracy. Expand
Spell checking algorithm for agglutinative languages “Central Kurdish as an example”
An algorithm to design and build a reliable and comprehensive Kurdish spell checker that can be used by the public in Kurdistan and was implemented in C# and as a. Expand
Normalization of Kazakh Texts
The main goal of this work is to develop a better algorithm for the normalization of Kazakh texts based on traditional and machine learning methods, as well as the new approach which is also considered in this paper. Expand


A spelling corrector for Basque based on morphology
The Xuxen spelling checker/corrector performs morphological decomposition in order to check misspellings and, to correct them, uses a new strategy which combines the use of an additional two-level morphological subsystem for orthographic errors. Expand
Designing spelling correctors for inflected languages using lexical transducers
This paper describes the components used in the design of the commercial X u x e n I I spelling checker/corrector for Basque. It is a new version of the Xuxen spelling corrector (Aduriz et al., 97)Expand
Automatic morphological analysis of Basque
The components of a robust and wide-coverage morphological analyser for Basque, based on the two-level formalism, are described and improved both the performance of the different components of the system and the description itself. Expand
Techniques for automatically correcting words in text
Research aimed at correcting words in text has focused on three progressively more difficult problems:(1) nonword error detection; (2) isolated-word error correction; and (3) context-dependent workExpand
Transfer-Based MT from Spanish into Basque: Reusability, Standardization and Open Source
An open architecture for machine translation from Spanish into Basque based on rules, which reuses several open tools and is based on an unique XML format for the flow between the different modules, which makes easer the interaction among different developers of tools and resources. Expand
Leveraging the open source ispell codebase for minority language analysis
This paper describes how the SzoSzablya ‘WordSword’ project leverages ispell’s Hungarian descendant, HunSpell, to create a whole set of related tools that tackle a wide range of low-level NLP-related tasks such as character set normalization, language detection, spellchecking, stemming, and morphological analysis. Expand
Finite State Morphology
Finite State Applications for Basque Proc . of EACL ' 2003 Workshop on Finite - State Methods in Natural Language Processing
  • 2003
A General Computational Model for Word-Form Recognition and Production
Two-level morphology: A general computational model of word-form recognition and production
  • Tech. Rep. Publication No. 11,
  • 1983