A Winnow-Based Approach to Context-Sensitive Spelling Correction

@article{Golding2004AWA,
  title={A Winnow-Based Approach to Context-Sensitive Spelling Correction},
  author={Andrew R. Golding and Dan Roth},
  journal={Machine Learning},
  year={2004},
  volume={34},
  pages={107-130}
}
A large class of machine-learning problems in natural language require the characterization of linguistic context. Two characteristic properties of such problems are that their feature space is of very high dimensionality, and their target concepts depend on only a small subset of the features in the space. Under such conditions, multiplicative weight-update algorithms such as Winnow have been shown to have exceptionally good theoretical properties. In the work reported here, we present an… 
MACHINE LEARNING APPROACH FOR CONTEXT-SENSITIVE ERROR DETECTION
Context-sensitive spelling errors are those errors resulting from mistyping or mispronouncing a word, and the resulting misspelled word is a valid language/dictionary word. For example, “This
Detection is the central problem in real-word spelling corr ection
TLDR
It is shown that the central problem in real-word spelling correction is detection, which means merely discriminating between the intended word and a random close variation of it within the context of a sentence is a task that can be performed with high accuracy using straightforward models.
Discriminative reranking for context-sensitive spell-checker
TLDR
A discourse-aware discriminative model is proposed to improve the results of context-sensitive spell-checkers by reranking their resulted n-best list by employing the features in a log-linear reranker system and achieves state-of-the-art performance on the Persian language.
A Complex ApproACh to SpellCheCking And AutoCorreCtion for ruSSiAn
This study discusses a number of methods that can be used jointly for error detection and correction, namely blacklists and pre-compiled dictionaries, a word2vec model, an N-gram language model and a
Four types of context for automatic spelling correction
TLDR
This paper presents an investigation on using four types of contextual information for improving the accuracy of automatic correction of single-token non-word misspellings, and describes an implemented system that achieves high accuracy on this task.
Context-Sensitive Spell Checking Based on Field Association Terms Dictionaries
TLDR
The values of precision, recall and F indicates that the proposed algorithm can produce in average 90, 70 and 78% respectively which means that the algorithm tends to produce a low percentage of false negative errors.
Learning to Find Context Based Spelling Errors
TLDR
This chapter presents an effective method called Ltest, which learns from prior, correct text how context-based spelling errors may manifest themselves, by purposely introducing such errors and analyzing the resulting text using a data mining algorithm.
Automatic detection and correction of context-dependent dt-mistakes using neural networks
TLDR
A novel approach to correcting context-dependent dt-mistakes, one of the most frequent spelling errors in the Dutch language, is introduced and a method to determine which words in a sentence cause the system to make corrections is proposed, valuable for providing feedback to the user.
Real-Word Typo Detection
TLDR
It is argued that CSSC has its limitations as a model, and a weakened CSSC model (RWTD) is proposed to partially counter these limitations by canceling its word-correction role.
Detection is the central problem in real-word spelling correction
TLDR
It is shown that the central problem in real-word spelling correction is detection, and trigram models cannot reliably find true errors without introducing many more, at least not when used in the obvious sequential way without added structure.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 67 REFERENCES
A Bayesian Hybrid Method for Context-sensitive Spelling Correction
TLDR
This paper takes Yarowsky's work as a starting point, applying decision lists to the problem of context-sensitive spelling correction, and finds that further improvements can be obtained by taking into account not just the single strongest piece of evidence, but ALL the available evidence.
Learning to Resolve Natural Language Ambiguities: A Unified Approach
  • D. Roth
  • Computer Science
    AAAI/IAAI
  • 1998
TLDR
An extensive experimental comparison of the approach with other methods on several well studied lexical disambiguation tasks such as context-sensitive spelling correction, prepositional phrase attachment and part of speech tagging shows that it outperforms other methods tried for these tasks or performs comparably to the best.
Part of Speech Tagging Using a Network of Linear Separators
TLDR
An architecture and an on-line learning algorithm are presented that utilizes this mistake-driven algorithm for multi-class prediction-selecting the part of speech of a word and it is shown that the algorithm performs comparably to the best known algorithms for POS.
Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction
TLDR
A hybrid method called Tribayes is introduced that combines the best of the previous two methods based on word trigrams and is found to have substantially higher performance than the grammar checker in Microsoft Word.
Mistake-Driven Learning in Text Categorization
TLDR
This work studies three mistake-driven learning algorithms for a typical task of this nature -- text categorization and presents an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.
Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow
TLDR
This work studies the performance of the linear-threshold algorithm, Winnow, in the presence of attribute errors in the data available for learning, and examines probabilistic mistake bounds that can be obtained by making stronger assumptions about the instances seen by the learner in models of noisy redundant information.
The Weighted Majority Algorithm
TLDR
A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm, which is robust in the presence of errors in the data, and is called the Weighted Majority Algorithm.
Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm
  • N. Littlestone
  • Mathematics
    28th Annual Symposium on Foundations of Computer Science (sfcs 1987)
  • 1987
Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in
Comparing Several Linear-threshold Learning Algorithms on Tasks Involving Superfluous Attributes
TLDR
Using simulations, several linear-threshold learning algorithms that differ greatly in the effect of superfluous attributes on their learning abilities are compared, including a Bayesian algorithm for conditionally independent attributes and two mistake-driven algorithms, Winnow and the Perceptron algorithm.
A method for disambiguating word senses in a large corpus
TLDR
The proposed method was designed to disambiguate senses that are usually associated with different topics using a Bayesian argument that has been applied successfully in related tasks such as author identification and information retrieval.
...
1
2
3
4
5
...