Skip to search formSkip to main content
You are currently offline. Some features of the site may not work correctly.

Download CORD-19

This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease. The corpus is updated regularly as new research is published in peer-reviewed publications and archival services like bioRxiv, medRxiv, and others.

Immediate action is required to halt the COVID-19 Pandemic and treat those it has affected. We therefore pledge to work with our partners to make intellectual property we license from them available free of charge for use in ending the COVID-19 pandemic and minimizing the impact of the disease.

By downloading this dataset you are agreeing to the Open Covid Pledge compatible Dataset License for the CORD-19 dataset that details the terms and conditions under which partner data and content is being made available. Specific licensing information for individual articles in the dataset is available in the metadata file.

Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.

Supplemental Resources

    Description

    The dataset contains all COVID-19 and coronavirus-related research (e.g. SARS, MERS, etc.) from the following sources:

    • PubMed's PMC open access corpus using this query (COVID-19 and coronavirus research)
    • Additional COVID-19 research articles from a corpus maintained by the WHO
    • bioRxiv and medRxiv pre-prints using the same query as PMC (COVID-19 and coronavirus research)

    We also provide a comprehensive metadata file of more than 50,000 coronavirus and COVID-19 research articles with links to PubMed, Microsoft Academic and the WHO COVID-19 database of publications (includes articles without open access full text).

    We recommend using metadata from the comprehensive file when available, instead of parsed metadata in the dataset. Please note the dataset may contain multiple entries for individual PMC IDs in cases when supplementary materials are available.

    This repository is linked to the WHO database of publications on coronavirus disease and other resources, such as Microsoft Academic Graph, PubMed, and Semantic Scholar.

    How to Cite

    Contribute to CORD-19

    Researchers
    For research inquiries, please contact kylel@allenai.org (Kyle Lo) and lucyw@allenai.org (Lucy Lu Wang). For inquiries regarding SciSight and knowledge discovery, please contact Tom Hope (tomh@allenai.org).
    Publishers
    To maximize impact and increase full text available to the global research community, we are actively encouraging publishers to make their research content openly available for AI projects like this that benefit the common good.

    If you’re a publisher interested in contributing to the CORD-19 corpus, please contact partnerships@allenai.org.
    Subscribe to CORD-19 News
    Join our mailing list to receive the latest CORD-19 news.