Extracting a Knowledge Base of Mechanisms from COVID-19 Papers

@article{Amini2021ExtractingAK,
  title={Extracting a Knowledge Base of Mechanisms from COVID-19 Papers},
  author={Aida Amini and Tom Hope and David Wadden and Madeleine van Zuylen and Eric Horvitz and Roy Schwartz and Hannaneh Hajishirzi},
  journal={ArXiv},
  year={2021},
  volume={abs/2010.03824}
}
The COVID-19 pandemic has spawned a diverse body of scientific literature that is challenging to navigate, stimulating interest in automated tools to help find useful knowledge. We pursue the construction of a knowledge base (KB) of mechanisms—a fundamental concept across the sciences, which encompasses activities, functions and causal relations, ranging from cellular processes to economic impacts. We extract this information from the natural language of scientific papers by developing a broad… 

Figures and Tables from this paper

A Search Engine for Discovery of Scientific Challenges and Directions
TLDR
A novel task of extraction and search of scientific challenges and directions, to facilitate rapid knowledge discovery on a large corpus of interdisciplinary work relating to the COVID-19 pandemic, ranging from biomedicine to areas such as AI and economics.
Extracting a Knowledge Base of COVID-19 Events from Social Media
TLDR
A manually annotated corpus of 10,000 tweets containing public reports of five COVID-19 events, including positive and negative tests, deaths, denied access to testing, claimed cures and preventions, shows that it can support fine-tuning BERTbased classifiers to automatically extract publicly reported events and help track the spread of a new disease.
Data Models for Annotating Biomedical Scholarly Publications: the Case of CORD-19
TLDR
This systematic review provides an analysis of the data models that have been applied to semantic annotation projects for the scholarly publications available in the CORD-19 dataset, an open database of the full texts of scholarly publications about COVID-19.
Queries related to COVID-19: a more effective retrieval through finetuned ALBERT with BM25L question answering system
TLDR
A finetuned ALBERT-based QA system in association with Best Match25 (Okapi BM25) ranking function and its variant BM25L for context retrieval and provided high scores in benchmark data sets such as SQuAD for answers related to COVID-19 questions.
CovRelex: A COVID-19 Retrieval System with Relation Extraction
TLDR
CovRelex is a scientific paper retrieval system targeting entities and relations via relation extraction on COVID-19 scientific papers aimed at building a system supporting users efficiently in acquiring knowledge across a huge number of CO VID-19 science papers published rapidly.
A Computational Inflection for Scientific Discovery
TLDR
The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself.
SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts
TLDR
This work presents a new task of hierarchical CDCR for concepts in scientific papers, with the goal of jointly inferring coreference clusters and hierarchy between them and creates SCICO, an expert-annotated dataset for this task.
DiSCoMaT: Distantly Supervised Composition Extraction from Tables in Materials Science Articles
TLDR
This work observes that materials science researchers organize similar compositions in a wide variety of table styles, necessitating an intelligent model for table understanding and composition extraction, and presents D I SC O M A T, a strong baseline geared towards this task, which outperforms recent table processing architectures by significant margins.
Predicting Informativeness of Semantic Triples
TLDR
This work uses full texts of biomedical publications to create a training corpus of informative and important semantic triples based on the notion that the main contributions of an article are summarized in its abstract, and suggests that an importance ranking for semantic tripling could also be generated.
...
...

References

SHOWING 1-10 OF 90 REFERENCES
SciSight: Combining faceted navigation and research group detection for COVID-19 exploratory scientific search
TLDR
Sight is presented, a system for exploratory search of COVID-19 research integrating two key capabilities: first, exploring associations between biomedical facets automatically extracted from papers; second, combining textual and network information to search and visualize groups of researchers and their ties.
CORD-19: The COVID-19 Open Research Dataset
TLDR
The mechanics of dataset construction are described, highlighting challenges and key design decisions, an overview of how CORD-19 has been used, and several shared tasks built around the dataset are described.
COVID-19 Knowledge Graph: Accelerating Information Retrieval and Discovery for Scientific Literature
TLDR
This work presents the COVID-19 Knowledge Graph (CKG), a heterogeneous graph for extracting and visualizing complex relationships between CO VID-19 scientific articles, and proposes a document similarity engine that leverages low-dimensional graph embeddings from the CKG with semanticembeddings for similar article retrieval.
Information Mining for COVID-19 Research From a Large Volume of Scientific Literature
TLDR
A graph-based model is developed using abstracts of 10,683 scientific articles to find key information on three topics: transmission, drug types, and genome research related to coronavirus to expedite and recommend new and alternative directions for COVID-19 research.
COVID-SEE: Scientific Evidence Explorer for COVID-19 Related Research
TLDR
COVID-SEE augments search by providing a visual overview supporting exploration of a collection to identify key articles of interest, and builds on several distinct text analysis and natural language processing methods to structure and organise information in publications.
Separating Wheat from Chaff: Joining Biomedical Knowledge and Patient Data for Repurposing Medications
We present a system that jointly harnesses large-scale electronic health records data and a concept graph mined from the medical literature to guide drug repurposing—the process of applying known
Literome: PubMed-scale genomic knowledge base in the cloud
TLDR
The Literome project has developed an automatic curation system to extract genomic knowledge from PubMed articles and made this knowledge available in the cloud with a Web site to facilitate browsing, searching and reasoning.
Constructing a semantic predication gold standard from the biomedical literature
TLDR
A multi-phase gold standard annotation study, in which 500 sentences randomly selected from MEDLINE abstracts on a wide range of biomedical topics with 1371 semantic predications are annotated, showing increasing agreement in the main annotation phase points out that an acceptable level of agreement can be achieved in multiple iterations.
COVID-19 SignSym – A fast adaptation of general clinical NLP tools to identify and normalize COVID-19 signs and symptoms to OMOP common data model
TLDR
An automated tool is built, which can extract signs/symptoms and their eight attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text and will provide fundamental supports to the secondary use of EHRs, thus accelerating the global research of COVID-19.
BioCreative V CDR task corpus: a resource for chemical disease relation extraction
TLDR
The BC5CDR corpus was successfully used for the BioCreative V challenge tasks and should serve as a valuable resource for the text-mining research community.
...
...