This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus is updated regularly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.
Immediate action is required to halt the COVID-19 Pandemic and treat those it has affected. We therefore pledge to work with our partners to make intellectual property we license from them available free of charge for use in ending the COVID-19 pandemic and minimizing the impact of the disease.
By downloading this dataset you are agreeing to the Open Covid Pledge compatible Dataset License for the CORD-19 dataset that details the terms and conditions under which partner data and content is being made available. Specific licensing information for individual articles in the dataset is available in the metadata file.
The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:
We also provide a comprehensive metadata file of more than 50,000 coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).
We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.
This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar.