Scaled Log Likelihood Ratios for the Detection of Abbreviations in Text Corpora

Abstract

We describe a language-independent, flexible, and accurate method for the detection of abbreviations in text corpora. It is based on the idea that an abbreviation can be viewed as a collocation, and can be identified by using methods for collocation detection such as the log likelihood ratio. Although the log likelihood ratio is known to show a good recall, its precision is poor. We employ scaling factors which lead to a strong improvement of precision. Experiments with English and German corpora show that abbreviations can be detected with high accuracy.

Extracted Key Phrases

Cite this paper

@inproceedings{Kiss2002ScaledLL, title={Scaled Log Likelihood Ratios for the Detection of Abbreviations in Text Corpora}, author={Tibor Kiss and Jan Strunk}, booktitle={COLING}, year={2002} }