CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering
@article{Yue2020CliniQG4QAGD, title={CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering}, author={Xiang Yue and Xinliang Frederick Zhang and Ziyu Yao and Simon M. Lin and Huan Sun}, journal={2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)}, year={2020}, pages={580-587} }
Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts. Studies show that neural QA models trained on one corpus may not generalize well to new clinical texts from a different institute or a different patient group, where largescale QA pairs are not readily available for model retraining. To address this challenge, we propose a simple yet effective framework, CliniQG4QA, which leverages question generation (QG) to synthesize QA…
14 Citations
QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation
- Computer ScienceArXiv
- 2022
This paper proposes a novel self-supervised framework called QADA for QA domain adaptation, which introduces a novel data augmentation pipeline used to augment training QA samples and develops an augmentation method which learns to drop context spans via a custom attentive sampling strategy.
Learning to Ask Like a Physician
- Computer Science, MedicineCLINICALNLP
- 2022
Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions paired with the snippets of text (triggers) that prompted each question, to characterize the types of information sought by medical experts.
DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries
- Computer ScienceLREC
- 2022
The goal is to provide a benchmark dataset for multi-modal QA systems, and to open up new avenues of research in improving question answering over EHR structured data by using context from unstructured clinical data.
Towards More Robust Natural Language Understanding
- Computer ScienceArXiv
- 2021
This thesis argues that, to achieve robust NLU, the model architecture/training and the dataset are equally important, and focuses on three NLU tasks to illustrate the robustness problem in different NLU task and the contributions to help achieve more robust natural language understanding.
JCSE: Contrastive Learning of Japanese Sentence Embeddings and Its Applications
- Computer ScienceArXiv
- 2023
A novel Japanese sentence representation framework, JCSE (derived from “Contrastive learning of Sentence Embeddings for Japanese”), is proposed that creates training data by generating sentences and syn-thesizing them with sentences available in a target domain and empirically demonstrates JCSE’sectiveness and practicability for downstream tasks of a low-resource language.
Automatic Question Generation from Indonesian Texts Using Text-to-Text Transformers
- Computer Science2022 International Conference on Electrical and Information Technology (IEIT)
- 2022
This study proposes an AQG system that utilizes the latest power Transformer, the multilingual Text-to-Text Transfer Transformer (mT5), and fine-tune the mT5 model to extract answers from context and generate questions based on those answers.
Generative Entity-to-Entity Stance Detection with Knowledge Graph Augmentation
- Computer ScienceArXiv
- 2022
A new task, entity-to-entity (E2E) stance detection, is introduced, which primes models to identify entities in their canonical names and dis-cern stances jointly and presents a novel generative framework to allow the generation of canonical names for entities as well as stances among them.
RadQA: A Question Answering Dataset to Improve Comprehension of Radiology Reports
- MedicineLREC
- 2022
A thorough analysis of the proposed RadQA dataset is conducted, examining the broad categories of disagreement in annotation and the reasoning requirements to answer a question (uncovering the huge dependence on medical knowledge for answering the questions).
Question Generation for Reading Comprehension Assessment by Modeling How and What to Ask
- Computer ScienceFINDINGS
- 2022
A two-step model (HTA-WTA) that takes advantage of previous datasets, and can generate questions for a specific targeted comprehension skill, and a new reading comprehension dataset that contains questions annotated with story-based reading comprehension skills (SBRCS), allowing for a more complete reader assessment.
BioADAPT-MRC: adversarial learning-based domain adaptation improves biomedical machine reading comprehension task
- Computer Science, BiologyBioinform.
- 2022
An adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets.
References
SHOWING 1-10 OF 61 REFERENCES
Publicly Available Clinical BERT Embeddings
- Computer ScienceProceedings of the 2nd Clinical Natural Language Processing Workshop
- 2019
This work explores and releases two BERT models for clinical text: one for generic clinical text and another for discharge summaries specifically, and demonstrates that using a domain-specific model yields performance improvements on 3/5 clinical NLP tasks, establishing a new state-of-the-art on the MedNLI dataset.
MIMIC-III, a freely accessible critical care database
- MedicineScientific data
- 2016
MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care…
Reading Wikipedia to Answer Open-Domain Questions
- Computer ScienceACL
- 2017
This approach combines a search component based on bigram hashing and TF-IDF matching with a multi-layer recurrent neural network model trained to detect answers in Wikipedia paragraphs, indicating that both modules are highly competitive with respect to existing counterparts.
Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset
- Computer ScienceACL
- 2020
An in-depth analysis of the emrQA dataset and the clinical reading comprehension (CliniRC) task is provided and the ability to utilize clinical domain knowledge and to generalize to unseen questions and contexts are explored.
Mixture Content Selection for Diverse Sequence Generation
- Computer ScienceEMNLP
- 2019
This work presents a method to explicitly separate diversification from generation using a general plug-and-play module (called SELECTOR) that wraps around and guides an existing encoder-decoder model.
Hierarchical Neural Story Generation
- Computer ScienceACL
- 2018
This work collects a large dataset of 300K human-written stories paired with writing prompts from an online forum that enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text.
Learning to Ask: Neural Question Generation for Reading Comprehension
- Computer ScienceACL
- 2017
An attention-based sequence learning model for the task and the effect of encoding sentence- vs. paragraph-level information is investigated and results show that the system significantly outperforms the state-of-the-art rule-based system.
COUGH: A Challenge Dataset and Models for COVID-19 FAQ Retrieval
- Computer ScienceEMNLP
- 2021
This work analyzes COUGH by testing different FAQ retrieval models built on top of BM25 and BERT, among which the best model achieves 48.8 under P@5, indicating a great challenge presented by COUGH and encouraging future research for further improvement.
Collecting Verified COVID-19 Question Answer Pairs
- Computer ScienceNLP4COVID@EMNLP
- 2020
A dataset of over 2,100 COVID19 related Frequently asked Question-Answer pairs scraped from over 40 trusted websites is released and an additional 24, 000 questions pulled from online sources that have been aligned by experts with existing answered questions from this dataset are included.
On the Importance of Diversity in Question Generation for QA
- Computer ScienceACL
- 2020
It is shown that diversity-promoting QG indeed provides better QA training than likelihood maximization approaches such as beam search, and a diversity-aware intrinsic measure of overall QG quality that correlates well with extrinsic evaluation on QA is proposed.