Improved Biomedical Word Embeddings in the Transformer Era

  title={Improved Biomedical Word Embeddings in the Transformer Era},
  author={Jiho Noh and Ramakanth Kavuluru},
  journal={Journal of biomedical informatics},

Figures and Tables from this paper

Ensemble-based Methods for Multi-label Classification on Biomedical Question-Answer Data

A multi-label classification of biomedical QA using ensemble models is better than single models and the result shows that heterogeneous ensembles are more potent than homogeneousEnsembles on biomedicalQA data with long text dimensions.

An Efficient multi-class SVM and Bayesian network based biomedical document ranking and classification framework using Gene-disease and ICD drug discovery databases

An integrated gene-disease database and ICD drug database codes are used to train the model using the optimized SVM classification model and Bayesian estimation model to optimize the word embedding model along with the key-phrase ranking and classification.

Artificial Intelligence in Pharmacovigilance: An Introduction to Terms, Concepts, Applications, and Limitations

Machine learning, in conjunction with natural language processing and data mining, to study adverse drug reactions in databases such as those found in electronic health records, claims databases, and social media, has the potential to enhance the characterization of known adverse effects and reactions and detect new signals.



A Comparison of Word Embeddings for the Biomedical Natural Language Processing

BioWordVec, improving biomedical word embeddings with subword information and MeSH

This work presents BioWordVec: an open set of biomedical word vectors/embeddings that combines subword information from unlabeled biomedical text with a widely-used biomedical controlled vocabulary called Medical Subject Headings (MeSH).

How to Train good Word Embeddings for Biomedical NLP

It is found that bigger corpora do not necessarily produce better biomedical domain word embeddings and one that creates contradictory results between intrinsic and extrinsic evaluations is observed.

Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

It is shown that weak supervision that leverages recent advances in representation learning can rival supervised approaches in biomedical WSD and external knowledge bases play a key role in the improvements achieved.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

GloVe: Global Vectors for Word Representation

A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

Medical Semantic Similarity with a Neural Language Model

The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

Medical Concept Embedding with Time-Aware Attention

This paper proposes to incorporate the temporal information to embed medical codes in EMRs using the Continuous Bag-of-Words model, which employs the attention mechanism to learn a ``soft'' time-aware context window for each medical concept.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.