• Corpus ID: 215238980

MedDialog: A Large-scale Medical Dialogue Dataset

@article{Chen2020MedDialogAL,
  title={MedDialog: A Large-scale Medical Dialogue Dataset},
  author={Shu Chen and Zeqian Ju and Xiangyu Dong and Hongchao Fang and Sicheng Wang and Yue Yang and Jiaqi Zeng and Ruisi Zhang and Ruoyu Zhang and Meng Zhou and Penghui Zhu and Pengtao Xie},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.03329}
}
Medical dialogue systems are promising in assisting in telemedicine to increase access to healthcare services, improve the quality of patient care, and reduce medical costs. To facilitate the research and development of medical dialogue systems, we build a large-scale medical dialogue dataset -- MedDialog -- that contains 1.1 million conversations between patients and doctors and 4 million utterances. To our best knowledge, MedDialog is the largest medical dialogue dataset to date. The dataset… 

M^2-MedDialog: A Dataset and Benchmarks for Multi-domain Multi-service Medical Dialogues

This work builds a Multiple-domain Multiple-service medical dialogue dataset, which contains 1,557 conversations between doctors and patients, covering 276 types of diseases, 2,468 medical entities, and 3 specialties of medical services, and formulates a one-stop MDS as a sequence-to-sequence generation problem.

Dual Memory Network for Medical Dialogue Generation

  • Zongli JiangJia XuJinli ZhangFenglong MaJianqiang Li
  • Computer Science
    2022 IEEE 24th Int Conf on High Performance Computing & Communications; 8th Int Conf on Data Science & Systems; 20th Int Conf on Smart City; 8th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)
  • 2022
DM-Net, which consists of two major memory modules, encoding of dialogue historical content is utilized as the query to query dialogue context memory and clinical experience memory to achieve a correct understanding of dialogue content and sensible reasoning of external knowledge, thereby generating accurate and reliable responses.

Building blocks of a task-oriented dialogue system in the healthcare domain

This paper proposes a framework for developing a dialogue system and shows preliminary results of simulated dialogue data generation by utilising expert knowledge and crowd-sourcing.

A Spoken Drug Prescription Dataset in French for Spoken Language Understanding

This paper presents the first spoken medical drug prescriptions corpus, named PxNLU, which contains 4 hours of transcribed and annotated dialogues of drug prescriptions in French acquired through an experiment with 55 participants experts and non-experts in prescriptions.

A benchmark for automatic medical consultation system: frameworks, tasks and datasets

This article creates a new large medical dialogue dataset with multi-level fine-grained annotations and establishes five independent tasks, including named entity recognition, dialogue act classification, symptom label inference, medical report generation and diagnosis-oriented dialogue policy.

Knowledge grounded medical dialogue generation using augmented graphs

A general method to embed the triples in each graph into large-scalable models and thereby generate clinically correct responses based on the conversation history using the recently recently released MedDialog(EN) dataset, and fine-tune the proposed Masked Entity Dialogue (MED) model on smaller corpora.

DialMed: A Dataset for Dialogue-based Medication Recommendation

This work constructs DIALMED, the first high-quality dataset for medical dialogue-based medication recommendation task, and proposes a Dialogue structure and Disease knowledge aware Network (DDN), where a QA Dialogue Graph mechanism is designed to model the dialogue structure and the knowledge graph is used to introduce external disease knowledge.

CDialog: A Multi-turn Covid-19 Conversation Dataset for Entity-Aware Dialog Generation

This work makes the very first attempt to release a high-quality multi-turn Medical Dialog dataset relating to Covid-19 disease named CDialog, with over 1K conversations collected from the online medical counselling websites.

ReMeDi: Resources for Multi-domain, Multi-service, Medical Dialogues

The creation of the �'urResources dataset, the Ă˜urResources benchmarking methods, and establish experimental results using the �‘ur resources benchmarking Methods are described to compare against for future research to comparison against.

Medical Dialogue Response Generation with Pivotal Information Recalling

A medical response generation model with Pivotal Information Recalling (MedPIR), which is built on two components, i.e., knowledge-aware dialogue graph encoder and recall-enhanced generator, which outperforms the strong baselines in BLEU scores and medical entities F1 measure.

On the Generation of Medical Dialogues for COVID-19

This work collects two dialogue datasets - CovidDialog - (in English and Chinese respectively) containing conversations between doctors and patients about COVID-19 and trains several dialogue generation models based on Transformer, GPT, and BERT-GPT to develop a medical dialogue system that can provide COVID19-related consultations.

Task-oriented Dialogue System for Automatic Diagnosis

Experimental results on this dialogue system show that additional symptoms extracted from conversation can greatly improve the accuracy for disease identification and the dialogue system is able to collect these symptoms automatically and make a better diagnosis.

End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis

An End-to-End Knowledge-routed Relational Dialogue System (KR-DS) that seamlessly incorporates rich medical knowledge graph into the topic transition in dialogue management, and makes it cooperative with natural language understanding and natural language generation is proposed.

DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation

It is shown that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems.

How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

This work investigates evaluation metrics for dialogue response generation systems where supervised labels, such as task completion, are not available and shows that these metrics correlate very weakly with human judgements in the non-technical Twitter domain, and not at all in the technical Ubuntu domain.

A Diversity-Promoting Objective Function for Neural Conversation Models

This work proposes using Maximum Mutual Information (MMI) as the objective function in neural models, and demonstrates that the proposed MMI models produce more diverse, interesting, and appropriate responses, yielding substantive gains in BLEU scores on two conversational datasets and in human evaluations.

Hierarchical Neural Story Generation

This work collects a large dataset of 300K human-written stories paired with writing prompts from an online forum that enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text.

Importance-Aware Learning for Neural Headline Editing

An encoder-decoder model which leverages large scale pre-trained language models and Self Importance-Aware (SIA) loss to address the different levels of editing in the dataset by down-weighting the importance of easily classified tokens and sentences.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Generating Informative and Diverse Conversational Responses via Adversarial Information Maximization

Adversarial Information Maximization (AIM), an adversarial learning framework that addresses informativeness and diversity, and explicitly optimizes a variational lower bound on pairwise mutual information between query and response.