• Corpus ID: 11568211

Detecting Malapropisms Using Measures of Contextual Fitness

  title={Detecting Malapropisms Using Measures of Contextual Fitness},
  author={Torsten Zesch},
  journal={Trait. Autom. des Langues},
  • Torsten Zesch
  • Published 2012
  • Art, Computer Science
  • Trait. Autom. des Langues
While detecting simple language errors (e.g. misspellings, number agreement, etc.) is nowadays standard functionality in all but the simplest text-editors, other more complicated language errors might go unnoticed. A difficult case are errors that come in the disguise of a valid word that fits syntactically into the sentence. We use the Wikipedia revision history to extract a dataset with such errors in their context. We show that the new dataset provides a more realistic picture of the… 
Word2Vec vs LSA pour la détection des erreurs orthographiques produisant un dérèglement sémantique en arabe (Word2Vec vs LSA for detecting semantic errors in Arabic language)
Word2Vec vs LSA for detecting semantic errors in Arabic language. Arabic words are lexically close to each other. The probability of having a correct word by making a typographical error is greater
Analogy-based Text Normalization : the case of unknowns words (Normalisation de textes par analogie: le cas des mots inconnus) [in French]
A system for improving the quality of noisy texts containing non-word errors by using an analogybased approach for acquiring normalisation rules and using them in the same way as lexical spelling correction rules is described.
Normalisation de textes par analogie: le cas des mots inconnus
Resume. Dans cet article, nous proposons et evaluons un systeme permettant d’ameliorer la qualite d’un texte bruite notamment par des erreurs orthographiques. Ce systeme a vocation a etre integre a


On Detection of Malapropisms by Multistage Collocation Testing
An algorithm of malapropism detection and correction based on evaluating the cohesion of text cohesion is presented, which is measured as the number of collocations it forms with the words in its immediate context.
HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task
This paper reports on the HOO 2012 shared task on error detection and correction in the use of prepositions and determiners, where systems developed by 14 teams from around the world were evaluated on the same previously unseen errorful text.
Mining Naturally-occurring Corrections and Paraphrases from Wikipedia's Revision History
A new freely-available resource built by automatically mining Wikipedia’s revision history, the WiCoPaCo corpus focuses on local modifications made by human revisors and include various types of corrections and rewritings, which can be categorized broadly into meaning-preserving and meaning-altering revisions.
Correcting real-word spelling errors by restoring lexical cohesion
A method for detecting and correcting many spelling errors by identifying tokens that are semantically unrelated to their context and are spelling variations of words that would be related to the context is presented.
Scaling Up Context-Sensitive Text Correction
This paper examines and offers solutions to several issues relating to scaling up a context sensitive text correction system and suggests methods to reduce the memory requirements while maintaining a high level of performance and shows how to significantly increase the coverage to realistic levels.
Mining Wikipedia's Article Revision History for Training Computational Linguistics Algorithms
A novel paradigm for obtaining large amounts of training data for computational linguistics tasks by mining Wikipedia’s article revision history is presented and it is proposed to use a sentence's persistence throughout a document's evolution as an indicator of its fitness as part of an extractive summary.
Detecting errors in English article usage by non-native speakers
One of the most difficult challenges faced by non-native speakers of English is mastering the system of English articles. We trained a maximum entropy classifier to select among a/an, the, or zero
Detecting errors in English article usage by non-native speakers
A maximum entropy classifier was trained to select among a/an, the, or zero article for noun phrases (NPs), based on a set of features extracted from the local context of each, and used to detect article errors in TOEFL essays of native speakers of Chinese, Japanese, and Russian.
Annotating ESL Errors: Challenges and Rewards
An analysis of errors in the annotated corpus by error categories and first language backgrounds, as well as inter-annotator agreement on the task are shown.
Real-Word Spelling Correction using Google Web 1T 3-grams
We present a method for detecting and correcting multiple real-word spelling errors using the Google Web IT 3-gram data set and a normalized and modified version of the Longest Common Subsequence