A Preliminary Study of Clinical Abbreviation Disambiguation in Real Time.
CONTEXT Abbreviations are used frequently in pathology reports and medical records. Efforts to identify and organize free-text concepts must correctly interpret medical abbreviations. During the past decade, the author has collected more than 12 000 medical abbreviations, concentrating on terms used or interpreted by pathologists. OBJECTIVE The purpose of the study is to provide readers with a listing of abbreviations. The listing of abbreviations is reviewed for the purpose of determining the variety of ways that long forms are shortened. DESIGN Abbreviations fell into different classes. These classes seemed amenable to distinct algorithmic approaches to their correct expansions. A discussion of these abbreviation classes was included to assist informaticians who are searching for ways to write software that expands abbreviations found in medical text. Classes were separated by the algorithmic approaches that could be used to map abbreviations to their correct expansions. A Perl implementation was developed to automatically match expansions with Unified Medical Language System concepts. MEASUREMENTS The abbreviation list contained 12 097 terms; 5772 abbreviations had unique expansions. There were 6325 polysemous abbreviation/expansion pairs. The expansions of 8599 abbreviations mapped to Unified Medical Language System concepts. Three hundred twenty-four abbreviations could be confused with unabbreviated words. Two hundred thirteen abbreviations had different expansions depending on whether the American or the British spellings were used. Nine hundred seventy abbreviations ended in the letter "s."Results.-There were 6 nonexclusive groups of abbreviations classed by expansion algorithm, as follows: (1) ephemeral; (2) hyponymous; (3) monosemous; (4) polysemous; (5) masqueraders of common words; and (6) fatal (abbreviations whose incorrect expansions could easily result in clinical errors). CONCLUSION Collecting and classifying abbreviations creates a logical approach to the development of class-specific algorithms designed to expand abbreviations. A large listing of medical abbreviations is placed into the public domain. The most current version is available at http://www.pathologyinformatics.org/downloads/abbtwo.htm.