• Corpus ID: 237420775

GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain

  title={GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain},
  author={Milad Moradi and Kathrin Blagec and Florian Haberl and Matthias Samwald},
Deep neural language models have set new breakthroughs in many tasks of Natural Language Processing (NLP). Recent work has shown that deep transformer language models (pretrained on large amounts of texts) can achieve high levels of task-specific few-shot performance comparable to state-of-the-art models. However, the ability of these large language models in few-shot transfer learning has not yet been explored in the biomedical domain. We investigated the performance of two powerful… 

Clinical Prompt Learning with Frozen Language Models

It is argued that prompt learning provides lower computational resource costs applicable to clinical settings, that can serve as an alternative to fine-tuning ever increasing in size PLMs.

Large Language Models are Zero-Shot Clinical Information Extractors

It is shown that large language models, such as GPT-3, perform well at zero-shot information extraction from clinical text despite not being trained specifically for the clinical domain, and that good resolvers share common components (e.g., “safety checks” that ensure the language model outputs faithfully match the input data).

Can large language models reason about medical questions?

It is speculated that scaling model and data, enhancing prompt alignment and allowing for better contextualization of the completions will be sufficient for LLMs to reach human-level performance on this type of task.

The Ghost in the Machine has an American accent: value conflict in GPT-3

The alignment problem in the context of large language models must consider the plurality of human values in our world. Whilst there are many resonant and overlapping values amongst the world’s

Misinfo Reaction Frames: Reasoning about Readers’ Reactions to News Headlines

This work demonstrates the feasibility and importance of pragmatic inferences on news headlines to help enhance AI-guided misinformation detection and mitigation and introduces a Misinfo Reaction Frames corpus, a crowdsourced dataset of reactions to over 25k news headlines focusing on global crises.

Digital Clinical Simulation Suite: Specifications and Architecture for Simulation-Based Pedagogy at Scale

The architecture of the Digital Clinical Simulation Suite is described and its use and efficacy in cases from online courses, colleges of education, and K-12 schools are illustrated.

Once Learning for Looking and Identifying Based on YOLO-v5 Object Detection

Analysis of the ability of a machine learning framework named “You Only Look Once,” to perform object localization task in a “Heuristic once learning” context showed that YOLO had difficulties to generalize simple abstractions of the characters, pointing to the necessity of new approaches to solve such challenges.

Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again

This is the first systematic and comprehensive study to compare the few-shot performance of GPT-3 in-context learning with smaller (i.e., BERT-sized) PLMs on two highly representative biomedical information extraction tasks, named entity recognition and relation extraction.



Lessons from Natural Language Inference in the Clinical Domain

This work introduces MedNLI - a dataset annotated by doctors, performing a natural language inference task (NLI), grounded in the medical history of patients, and presents strategies to leverage transfer learning using datasets from the open domain and incorporate domain knowledge from external data and lexical sources.

SciBERT: A Pretrained Language Model for Scientific Text

SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

This article introduces BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining), which is a domain-specific language representation model pre-trained on large-scale biomedical corpora that largely outperforms BERT and previous state-of-the-art models in a variety of biomedical text mining tasks when pre- trained on biomedical Corpora.

Scalable Few-Shot Learning of Robust Biomedical Name Representations

This paper validates the proposed few-shot learning approach as a low-cost alternative for exploring the impact of conceptual distinctions on robust biomedical name representations, and shows that it allows for continual learning, where it accumulates information from various conceptual hierarchies to consistently improve encoder performance.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

MedSTS: a resource for clinical semantic textual similarity

This paper elaborates the efforts to assemble a resource for STS in the medical domain, MedSTS, which consists of a total of 174,629 sentence pairs gathered from a clinical corpus at Mayo Clinic, and analyzed the medical concepts in the Med STS corpus.

Deep Contextualized Word Representations

A new type of deep contextualized word representation is introduced that models both complex characteristics of word use and how these uses vary across linguistic contexts, allowing downstream models to mix different types of semi-supervision signals.

XLNet: Generalized Autoregressive Pretraining for Language Understanding

XLNet is proposed, a generalized autoregressive pretraining method that enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and overcomes the limitations of BERT thanks to its autore progressive formulation.

PubMedQA: A Dataset for Biomedical Research Question Answering

The best performing model, multi-phase fine-tuning of BioBERT with long answer bag-of-word statistics as additional supervision, achieves 68.1% accuracy, compared to single human performance of 78.0% accuracy and majority-baseline of 55.2% accuracy.

PubMed 200k RCT: a Dataset for Sequential Sentence Classification in Medical Abstracts

We present PubMed 200k RCT, a new dataset based on PubMed for sequential sentence classification. The dataset consists of approximately 200,000 abstracts of randomized controlled trials, totaling 2.3