• Corpus ID: 245650784

Semantic Search for Large Scale Clinical Ontologies.

  title={Semantic Search for Large Scale Clinical Ontologies.},
  author={Duy-Hoa Ngo and Madonna Kemp and Donna Truran and Bevan Koopman and Alejandro Metke-Jimenez},
  journal={AMIA ... Annual Symposium proceedings. AMIA Symposium},
Finding concepts in large clinical ontologies can be challenging when queries use different vocabularies. A search algorithm that overcomes this problem is useful in applications such as concept normalisation and ontology matching, where concepts can be referred to in different ways, using different synonyms. In this paper, we present a deep learning based approach to build a semantic search system for large clinical ontologies. We propose a Triplet-BERT model and a method that generates… 

Figures and Tables from this paper



A Hybrid Normalization Method for Medical Concepts in Clinical Narrative using Semantic Matching.

A hybrid normalization system that incorporates a deep learning model to complement the traditional dictionary lookup approach is developed, revealing existing inconsistencies in ShARe/CLEF data, as well as problematic ambiguities in the UMLS.

Ontoserver: a syndicated terminology server

Ontoserver is a clinical terminology server implementation that aims to overcome some of the challenges that have hindered adoption of standardised clinical terminologies and is used in several organisations throughout Australia.

Clinical information extraction applications: A literature review

Algorithmic and user study of an autocompletion algorithm on a large medical vocabulary

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.

A large annotated corpus for learning natural language inference

The Stanford Natural Language Inference corpus is introduced, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning, which allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation

The STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017), providing insight into the limitations of existing models.

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

The Multi-Genre Natural Language Inference corpus is introduced, a dataset designed for use in the development and evaluation of machine learning models for sentence understanding and shows that it represents a substantially more difficult task than does the Stanford NLI corpus.