Text mining approaches for dealing with the rapidly expanding literature on COVID-19

  title={Text mining approaches for dealing with the rapidly expanding literature on COVID-19},
  author={Lucy Lu Wang and Kyle Lo},
  journal={Briefings in Bioinformatics},
  pages={781 - 799}
Abstract More than 50 000 papers have been published about COVID-19 since the beginning of 2020 and several hundred new papers continue to be published every day. This incredible rate of scientific productivity leads to information overload, making it difficult for researchers, clinicians and public health officials to keep up with the latest findings. Automated text mining techniques for searching, reading and summarizing papers are helpful for addressing information overload. In this review… 

Figures from this paper

Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals
The main idea is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation.
Overview of the COVID-19 Text Mining Tool Interactive Demo Track in BioCreative VII
The BioCreative COVID-19 text mining tool interactive demo track was created to gauge user-system compliance and establish a two-way communication channel between system developers and potential end users to provide system designers with useful feedback on the performance and usability of their tools.
Question Answering Systems for Covid-19
The survey of QA systems-CovidQA, CAiRE (Center for Artificial Intelligence Research)-COVID system, CO-search semantic search engine, COVIDASK, RECORD (Research Engine for COVID Open Research Dataset) available for CO VID-19 are described.
Analyzing the research trends of COVID-19 using topic modeling approach
Purpose The COVID-19 pandemic has impacted 222 countries across the globe, with millions of people losing their lives. The threat from the virus may be assessed from the fact that most countries
A Survey for News Credibility in Social Networks
How to use text mining to determine the credibility of news on social media is covered, which could be used as the basis for future text mining research.
Prioritization of COVID-19-Related Literature via Unsupervised Keyphrase Extraction and Document Representation Learning
In this work, the current body of COVID-19-related literature is annotated using unsupervised keyphrase extraction, facilitating the initial queries to the latent space containing the learned document embeddings (low-dimensional representations), which is accessible through a web server capable of interactive search, term ranking, and exploration of potentially interesting literature.
Searching for scientific evidence in a pandemic: An overview of TREC-COVID
Meta-research on COVID-19: An overview of the early trends
It is speculated that some aspects of doing research during COVID-19 are more likely to persist than others, and the shift to virtual for academic events such as conferences; the use of openly accessible pre-prints; the ‘datafication’ of scholarly literature and consequent broader adoption of machine learning in science communication.
Classifying domain-specific text documents containing ambiguous keywords
A number of classification algorithms for identifying a domain-specific set of papers about echinoderm species are evaluated and it is shown how effective the resulting classifiers are in filtering irrelevant articles returned from PubMed.


COVID-19 Knowledge Graph: Accelerating Information Retrieval and Discovery for Scientific Literature
This work presents the COVID-19 Knowledge Graph (CKG), a heterogeneous graph for extracting and visualizing complex relationships between CO VID-19 scientific articles, and proposes a document similarity engine that leverages low-dimensional graph embeddings from the CKG with semanticembeddings for similar article retrieval.
Interactive Extractive Search over Biomedical Corpora
A light-weight query language is introduced that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup, allowing for rapid exploration, development and refinement of user queries.
Automatic Textual Evidence Mining in COVID-19 Literature
EVIDENCEMINER is a web-based system that lets users query a natural language statement and automatically retrieves textual evidence from a background corpora for life sciences and is constructed in a completely automated way without any human effort for training data annotation.
A survey of current work in biomedical text mining
The major challenge of biomedical text mining over the next 5-10 years will require enhanced access to full text, better understanding of the feature space of biomedical literature, better methods for measuring the usefulness of systems to users, and continued cooperation with the biomedical research community to ensure that their needs are addressed.
CORD-19: The COVID-19 Open Research Dataset
The mechanics of dataset construction are described, highlighting challenges and key design decisions, an overview of how CORD-19 has been used, and several shared tasks built around the dataset are described.
BEST: Next-Generation Biomedical Entity Search Tool for Knowledge Discovery from Biomedical Literature
BEST, a biomedical entity search tool, is introduced, the only system that processes free text queries and returns up-to-date results in real time including mutation information in the results.
Text mining and its potential applications in systems biology.
Frontiers of biomedical text mining: current progress
The current state of the art in biomedical text mining or 'BioNLP' in general is reviewed, focusing primarily on papers published within the past year.
EVIDENCEMINER: Textual Evidence Discovery for Life Sciences
EVIDENCEMINER is a web-based system that lets users query a natural language statement and automatically retrieves textual evidence from a background corpora for life sciences, supported by novel data-driven methods for distantly supervised named entity recognition and open information extraction.
CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization
CO-Search is presented, a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers during a time of crisis.