Stress Test Evaluation of Biomedical Word Embeddings

@article{Araujo2021StressTE,
  title={Stress Test Evaluation of Biomedical Word Embeddings},
  author={Vladimir Araujo and Andr{\'e}s Carvallo and C. Aspillaga and C. Thorne and Denis Parra},
  journal={ArXiv},
  year={2021},
  volume={abs/2107.11652}
}
The success of pretrained word embeddings has motivated their use in the biomedical domain, with contextualized embeddings yielding remarkable results in several biomedical NLP tasks. However, there is a lack of research on quantifying their behavior under severe “stress” scenarios. In this work, we systematically evaluate three language models with adversarial examples – automatically constructed tests that allow us to examine how robust the models are. We propose two types of stress scenarios… Expand

Figures and Tables from this paper

Rating and aspect-based opinion graph embeddings for explainable recommendations
TLDR
This paper proposes to exploit embeddings extracted from graphs that combine information from ratings and aspectbased opinions expressed in textual reviews, and adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains. Expand
Graphing else matters: exploiting aspect opinions and ratings in explainable graph-based recommendations
TLDR
This paper proposes to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews, and adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Expand

References

SHOWING 1-10 OF 42 REFERENCES
Adversarial Evaluation of BERT for Biomedical Named Entity Recognition
The success of pre-trained word embeddings of the BERT model has motivated its use in tasks in the biomedical domain. However, it is not clear if this model works correctly in real scenarios. In thisExpand
Probing Biomedical Embeddings from Language Models
TLDR
As a fixed feature extractor BioELMo outperforms BioBERT in probing tasks, and visualization and nearest neighbor analysis is used to show that better encoding of entity-type and relational information leads to this superiority. Expand
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
TLDR
Evaluated Transformer-based models in Natural Language Inference and Question Answering tasks reveal that RoBERTa, XLNet and BERT are more robust than recurrent neural network models to stress tests for both NLI and QA tasks, revealing that there is still room for future improvement in this field. Expand
Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets
TLDR
The Biomedical Language Understanding Evaluation (BLUE) benchmark is introduced to facilitate research in the development of pre-training language representations in the biomedicine domain and it is found that the BERT model pre-trained on PubMed abstracts and MIMIC-III clinical notes achieves the best results. Expand
Stress Test Evaluation for Natural Language Inference
TLDR
This work proposes an evaluation methodology consisting of automatically constructed “stress tests” that allow us to examine whether systems have the ability to make real inferential decisions, and reveals strengths and weaknesses of these models with respect to challenging linguistic phenomena. Expand
Deep Contextualized Word Representations
TLDR
A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals. Expand
Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings
TLDR
The results on two patent corpora show that contextualized word representations generated from ELMo substantially improve chemical NER performance w.r.t. the current state-of-the-art. Expand
Automatic document screening of medical literature using word and text embeddings in an active learning setting
TLDR
The results indicate that word as well as textual neural embeddings always outperform the traditional TF-IDF representation and the results of evaluating the best models, trained using active learning, with other authors methods from CLEF eHealth show better results in terms of work saved for physicians in the document-screening task. Expand
Adversarial Attacks on Deep-learning Models in Natural Language Processing
TLDR
A systematic survey on preliminary knowledge of NLP and related seminal works in computer vision is presented, which collects all related academic works since the first appearance in 2017 and analyzes 40 representative works in a comprehensive way. Expand
Adversarial Examples for Evaluating Reading Comprehension Systems
TLDR
This work proposes an adversarial evaluation scheme for the Stanford Question Answering Dataset that tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences without changing the correct answer or misleading humans. Expand
...
1
2
3
4
5
...