Scaling up WSD with Automatically Generated Examples

Weiwei Cheng; Judita Preiss; Mark Stevenson

Corpus ID: 1946135

Scaling up WSD with Automatically Generated Examples

@inproceedings{Cheng2012ScalingUW,
  title={Scaling up WSD with Automatically Generated Examples},
  author={Weiwei Cheng and Judita Preiss and Mark Stevenson},
  booktitle={BioNLP@HLT-NAACL},
  year={2012},
  url={https://api.semanticscholar.org/CorpusID:1946135}
}

Weiwei ChengJudita PreissMark Stevenson
Published in BioNLP@HLT-NAACL 8 June 2012
Computer Science, Medicine

This paper describes a large scale WSD system based on automatically labeled examples generated using information from the UMLS Metathesaurus which is found to outperform a state-of-the-art unsupervised approach which also uses information fromThe Metatheaurus.

9 Citations

Tables from this paper

Topics

Word Sense Disambiguation WSD System Data Sets State Of The Art Ambiguous Terms Supervised Learning Labeled Training Examples

DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples

Judita PreissMark Stevenson

Computer Science, Medicine

North American Chapter of the Association for…

2013

DALE (Disambiguation using Automatically Labeled Examples) is a supervised WSD system that can disambiguate a wide range of ambiguities found in biomedical documents and uses the UMLS Metathesaurus as both a sense inventory and a source of information for automatically generating labeled training examples.

The effect of word sense disambiguation accuracy on literature based discovery

Judita PreissMark Stevenson

Computer Science, Medicine

BMC Medical Informatics and Decision Making

2016

This study reveals that LBD performance is sensitive to WSD accuracy, and concludes that WSD has the potential to improve the output of LBD systems by reducing the amount of spurious hidden knowledge that is generated.

Acronym Disambiguation in Clinical Notes from Electronic Health Records

N. LinkSelena Huang C. Hong

Medicine, Computer Science

medRxiv

2020

This study introduces an unsupervised method for acronym disambiguation, the task of classifying the correct sense of acronyms in the clinical EHR notes, and demonstrates that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis.

Evaluating knowledge-poor and knowledge-rich features in automatic classification: A case study in WSD

Marcos Zampieri

Computer Science, Linguistics

International Symposium on Computational…

2012

This work evaluates the automatic disambiguation performance of five machine learning classifiers: Naive Bayes, Support Vector Machines, Decision Trees, KStar and Maximum Entropy.

Semantic Type Classification of Common Words in Biomedical Noun Phrases

A. SiuG. Weikum

Computer Science, Medicine

BioNLP@IJCNLP

2015

The task of classifying common nouns onto fine-grained semantic types is addressed: for instance, “condition” can be typed as “symptom and finding” or “configuration and setting”.

Knowledge based word-concept model estimation and refinement for biomedical text mining.

Antonio Jose Jimeno YepesR. Berlanga

Computer Science, Medicine

Journal of Biomedical Informatics

2015

Tailored semantic annotation for semantic search

Rafael Berlanga LlavoriV. NebotMaría Pérez Catalán

Computer Science

Journal of Web Semantics

2015

The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content Analysis

X. Jing

Medicine, Computer Science

JMIR Medical Informatics

2021

The results, although largely related to academia, demonstrate that UMLS achieves its intended uses successfully, in addition to achieving uses broadly beyond its original intentions.

[PDF]

Knowledge based word-concept model estimation and refinement for biomedical text mining

Antonio Jimeno-YepesRafael Berlanga Llavori

Computer Science, Medicine

Journal of Biomedical Informatics

2015

This research presents a probabilistic procedure to estimate the likelihood that a person’s immune system will decline with age and disease, and a histopathological assessment shows that the likelihood of an immune attack is low.

Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias

Eneko AgirreDavid Martínez

Computer Science

Conference on Empirical Methods in Natural…

2004

The “WordNet monosemous relatives” method is applied to construct automatically a web corpus that is used to train disambiguation systems and has been used toTrain WSD algorithms that include supervised methods, minimally supervised, and fully unsupervised.

Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus

Mark StevensonYikun Guo

Computer Science, Medicine

Journal of Biomedical Informatics

2010

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Antonio Jimeno-YepesA. Aronson

Computer Science, Medicine

BMC Bioinformatics

2010

Four approaches which rely on the UMLS Metathesaurus as the source of knowledge to perform word sense disambiguation (WSD) achieve better results, but the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline.

Self-training and co-training in biomedical word sense disambiguation

Antonio Jimeno-YepesA. Aronson

Computer Science, Medicine

BioNLP@ACL

2011

Preliminary results of two semi-supervised learning algorithms on biomedical word sense disambiguation are presented, which add relevant unlabeled examples to the training set, and optimal parameters are similar for each ambiguous word.

Graph-based Word Sense Disambiguation of biomedical documents

Eneko AgirreAitor Soroa EtxabeMark Stevenson

Computer Science, Medicine

Bioinform.

2010

A graph-based approach to WSD in the biomedical domain, which makes use of knowledge from the Unified Medical Language System (UMLS) Metathesaurus which is represented as a graph, outperforms other methods that rely on the UMLS Met athesaurus alone.

Effects of information and machine learning algorithms on word sense disambiguation with small datasets

Gondy LeroyThomas C. Rindflesch

Computer Science, Medicine

Int. J. Medical Informatics

2005

Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS

Hongfang LiuStephen B. JohnsonC. Friedman

Computer Science

J. Am. Medical Informatics Assoc.

2002

An automatic method that constructs sense-tagged corpora for ambiguous terms in the UMLS using MEDLINE abstracts and can be used to automatically acquire knowledge needed for resolving ambiguity when mapping free-text to U MLS concepts is proposed.

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation

Antonio Jimeno-YepesBridget T. McInnesA. Aronson

Computer Science, Medicine

BMC Bioinformatics

2010

A method that can be used to automatically develop a WSD test collection using the Unified Medical Language System (UMLS) Metathesaurus and the manual MeSH indexing of MEDLINE is presented and allows the evaluation of WSD algorithms in the biomedical domain.

Using Corpus Statistics and WordNet Relations for Sense Identification

C. LeacockM. ChodorowG. Miller

Computer Science, Linguistics

International Conference on Computational Logic

1998

A statistical classifier is described that combines topical context with local cues to identify a word sense and is used to disambiguate a noun, a verb, and an adjective.

Disambiguation of biomedical text using diverse sources of information

Mark StevensonYikun GuoR. GaizauskasDavid Martínez

Computer Science, Medicine

BMC Bioinformatics

2008

Disambiguation of biomedical terms benefits from the use of information from a variety of sources including linguistic features of the context in which the ambiguous term is used and domain-specific resources, such as UMLS.

Scaling up WSD with Automatically Generated Examples

Tables from this paper

Topics

9 Citations

DALE: A Word Sense Disambiguation System for Biomedical Documents Trained using Automatically Labeled Examples

The effect of word sense disambiguation accuracy on literature based discovery

Acronym Disambiguation in Clinical Notes from Electronic Health Records

Evaluating knowledge-poor and knowledge-rich features in automatic classification: A case study in WSD

Semantic Type Classification of Common Words in Biomedical Noun Phrases

Knowledge based word-concept model estimation and refinement for biomedical text mining.

Tailored semantic annotation for semantic search

The Unified Medical Language System at 30 Years and How It Is Used and Published: Systematic Review and Content Analysis

Knowledge based word-concept model estimation and refinement for biomedical text mining

27 References

Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias

Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Self-training and co-training in biomedical word sense disambiguation

Graph-based Word Sense Disambiguation of biomedical documents

Effects of information and machine learning algorithms on word sense disambiguation with small datasets

Research Paper: Automatic Resolution of Ambiguous Terms Based on Machine Learning and Conceptual Relations in the UMLS

Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation

Using Corpus Statistics and WordNet Relations for Sense Identification

Disambiguation of biomedical text using diverse sources of information

Related Papers