• Corpus ID: 216056360

CORD-19: The COVID-19 Open Research Dataset

@article{Wang2020CORD19TC,
  title={CORD-19: The COVID-19 Open Research Dataset},
  author={Lucy Lu Wang and Kyle Lo and Yoganand Chandrasekhar and Russell Reas and Jiangjiang Yang and Darrin Eide and Kathryn Funk and Rodney Michael Kinney and Ziyang Liu and William Cooper Merrill and Paul Mooney and Dewey A. Murdick and Devvret Rishi and Jerry Sheehan and Zhihong Shen and Brandon Stilson and Alex D Wade and Kuansan Wang and Christopher Wilhelm and Boya Xie and Douglas A. Raymond and Daniel S. Weld and Oren Etzioni and Sebastian Kohlmeier},
  journal={ArXiv},
  year={2020}
}
The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research. CORD-19 is designed to facilitate the development of text mining and information retrieval systems over its rich collection of metadata and structured full text papers. Since its release, CORD-19 has been downloaded over 200K times and has served as the basis of many COVID-19 text mining and discovery systems. In this article, we describe the mechanics… 

Figures and Tables from this paper

COVIDSeer: Extending the CORD-19 Dataset
TLDR
An enhanced version of CORD-19 dataset is developed and a vertical search engine COVIDSeer is built based on the new dataset, which offers keyphrase-enhanced search and will hopefully help biomedical and life science researchers, medical students, and the general public to more effectively explore coronavirus-related literature.
COVID19 Drug Repository: text-mining the literature in search of putative COVID19 therapeutics
TLDR
The COVID19 Drug Repository enables users to focus on different levels of complexity, starting from general information about (FDA-) approved drugs, PubMed references, clinical trials, recipes as well as the descriptions of molecular mechanisms of drugs’ action.
A scientometric overview of CORD-19
TLDR
Based on a comparison to the Web of Science database, it is found that CORD-19 provides an almost complete coverage of research on COVID-19 and coronaviruses.
Repurposing TREC-COVID Annotations to Answer the Key Questions of CORD-19
TLDR
This work repurposes the relevancy annotations for TREC-COVID tasks to identify journal articles in CORD-19 which are relevant to the key questions posed by Cord-19, and presents the methodology used to construct the new dataset.
Covid-on-the-Web: Knowledge Graph and Services to Advance COVID-19 Research
TLDR
The Covid-on-the-Web project aims to allow biomedical researchers to access, query and make sense of COVID-19 related literature, and adapts, combines and extends tools to process, analyze and enrich the "CO VID-19 Open Research Dataset" (CORD-19).
Using Machine Learning Algorithms for Finding the Topics of COVID-19 Open Research Dataset Automatically
TLDR
The topic modeling pipeline presented in this thesis helps researchers gain an overview of the topics addressed in the papers of COVID-19, SARS-CoV-2, and related coronaviruses curated by the Allen Institute for AI.
LitCovid: an open database of COVID-19 literature
TLDR
LitCovid is the first-of-its-kind COVID-19-specific literature resource, with all of its collected articles and curated data freely available, and has been widely used, with millions of accesses by users worldwide for various information needs.
COVIDSeer : Filling missing pieces in the CORD-19 dataset
TLDR
An enhanced version of CORD19 dataset is developed and a vertical search engine COVIDSeer is built based on the new dataset, which offers keyphrase-enhanced search and will hopefully help biomedical and life science researchers, medical students, and the general public to explore coronavirus-related literature more effectively.
Synthetic Target Domain Supervision for Open Retrieval QA
TLDR
This work stress-test the Dense Passage Retriever (DPR)---a state-of-the-art (SOTA) open domain neural retrieval model---on closed and specialized target domains such as COVID-19, and finds that it lags behind standard BM25 in this important real-world setting.
...
...

References

SHOWING 1-10 OF 47 REFERENCES
Information Mining for COVID-19 Research From a Large Volume of Scientific Literature
TLDR
A graph-based model is developed using abstracts of 10,683 scientific articles to find key information on three topics: transmission, drug types, and genome research related to coronavirus to expedite and recommend new and alternative directions for COVID-19 research.
COVID-19 and Inflammatory Bowel Diseases: risk assessment, shared molecular pathways and therapeutic challenges
TLDR
Using current understanding of SARS-CoV-2 as well as other pathogenic coronaviruses immunopathology, it is shown why IBD patients should not be considered at an increased risk of infection or more severe outcomes.
Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision
TLDR
This CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus covers 75 fine-grained entity types, which may benefit research on CO VID-19 related virus, spreading mechanisms, and potential vaccines.
Identifying Radiological Findings Related to COVID-19 from Medical Literature
TLDR
This work develops natural language processing methods to analyze a large collection of COVID-19 literature containing study reports from hospitals all over the world, reconcile these results, and draw unbiased and universally-sensible conclusions about the correlation between radiological findings and CO VID-19.
TREC-COVID: Constructing a Pandemic Information Retrieval Test Collection
TREC-COVID is a community evaluation designed to build a test collection that captures the information needs of biomedical researchers using the scientific literature during a pandemic. One of the
Tuberculosis and COVID-19 in 2020: lessons from the past viral outbreaks and possible future outcomes
TLDR
Investigating the pathological pathways linking TB and SARS-CoV-2 leads to the idea that their coexistence might yield a more severe clinical evolution, and the issues of vaccination and diagnostic reliability in the context of coinfection are addressed.
TREC-COVID: rationale and structure of an information retrieval shared task for COVID-19
TLDR
TREC-COVID differs from traditional IR shared task evaluations with special considerations for the expected users, IR modality considerations, topic development, participant requirements, assessment process, relevance criteria, evaluation metrics, iteration process, projected timeline, and the implications of data use as a post-task test collection.
Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset
The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the Allen
Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing
TLDR
CoVex renders COVID-19 drug research systems-medicine-ready by giving the scientific community direct access to network medicine algorithms and investigates recent hypotheses on a systems biology level to explore mechanistic virus life cycle drivers, and to extract drug repurposing candidates.
PMC text mining subset in BioC: about three million full-text articles and growing
TLDR
To facilitate automated processing of nearly 3 million full-text articles (in PMC Open Access and Author Manuscript subsets) and to improve interoperability, BioC, a community-driven simple data structure in either XML or JSON format is converted.
...
...