Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction

  title={Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction},
  author={Andrew R. Golding and Yves Schabes},
This paper addresses the problem of correcting spelling errors that result in valid, though unintended words (such as peace and piece, or quiet and quite) and also the problem of correcting particular word usage errors (such as amount and number, or among and between). Such corrections require contextual information and are not handled by conventional spelling programs such as Unix spell. First, we introduce a method called Trigrams that uses part-of-speech trigrams to encode the context. This… 

Tables from this paper

Combining trigram and automatic weight distribution in Chinese spelling error correction
A novel and efficient algorithm for the system of Chinese spelling error correction, CInsunSpell, is presented, which employs a new method of automatically and dynamically distributing weights among the characters in the confusion set as well as in the Bayesian language model.
A Spelling Mistake Correction (SMC) Model for Resolving Real-Word Error
A spelling correction system whose main focus is on automatic identification and correction of real word errors accurately and efficiently is proposed, which includes hybridization of Trigram and Bayesian approach and using Modified Brown corpus as a training set.
The results reported in this paper are encouraging and show that the method is effective, capable of detecting context-sensitive errors with an accuracy in the range of ~86% ~95%.
Discriminative reranking for context-sensitive spell-checker
A discourse-aware discriminative model is proposed to improve the results of context-sensitive spell-checkers by reranking their resulted n-best list by employing the features in a log-linear reranker system and achieves state-of-the-art performance on the Persian language.
Error Detection and Corrections in Indic OCR Using LSTMs
A Long Short-Term Memory based character level language model with a fixed delay for discriminative language modeling in the context of OCR errors for jointly addressing the problems of error detection and correction in Indic OCR is adopted.
Using Part-of-Speech and Word-Sense Disambiguation for Boosting String-Edit Distance Spelling Correction
The design of a system for correcting spelling errors resulting in non-existent words and a significant improvement compared to context-free spelling correction are reported on.
A simple real-word error detection and correction using local word bigram and trigram
A localized real word error detection and correction method is proposed where the scores of bigrams generated by immediate left and right neighbour of the candidate word and the trigram of these three words are combined.
Learning to Find Context Based Spelling Errors
This chapter presents an effective method called Ltest, which learns from prior, correct text how context-based spelling errors may manifest themselves, by purposely introducing such errors and analyzing the resulting text using a data mining algorithm.
Text Induced Spelling Correction
TISC, a language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora, is presented and the implemented prototype is described and evaluated.


A Bayesian Hybrid Method for Context-sensitive Spelling Correction
This paper takes Yarowsky's work as a starting point, applying decision lists to the problem of context-sensitive spelling correction, and finds that further improvements can be obtained by taking into account not just the single strongest piece of evidence, but ALL the available evidence.
Techniques for automatically correcting words in text
Research aimed at correcting words in text has focused on three progressively more difficult problems: nonword error detection; (2) isolated-word error correction; and (3) context-dependent work correction, which surveys documented findings on spelling error patterns.
Context based spelling correction
A method for disambiguating word senses in a large corpus
The proposed method was designed to disambiguate senses that are usually associated with different topics using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval.
A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text
A program that tags each word in an input sentence with the most likely part of speech has been written and performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct.
Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French
This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an
A note on undetected typing errors
Although human proofreading is still necessary, small, topic-specific word lists in spelling programs will minimize the occurrence of undetected typing errors.
Random House Unabridged Dictionary. Random House
  • 1983