Interactive Extractive Search over Biomedical Corpora

  title={Interactive Extractive Search over Biomedical Corpora},
  author={Hillel Taub-Tabib and Micah Shlain and Shoval Sadde and Daniel Lahav and Matan Eyal and Yaara Cohen and Yoav Goldberg},
We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to dependency-based search, we introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by… Expand
A Search Engine for Discovery of Scientific Challenges and Directions
A novel task of extraction and search of scientific challenges and directions, to facilitate rapid knowledge discovery on a large corpus of interdisciplinary work relating to the COVID-19 pandemic, ranging from biomedicine to areas such as AI and economics. Expand
Text mining approaches for dealing with the rapidly expanding literature on COVID-19
This review discusses the corpora, modeling resources, systems and shared tasks that have been introduced for COVID-19, and lists 39 systems that provide functionality such as search, discovery, visualization and summarization over the CO VID-19 literature. Expand
Neural Extractive Search
The goals of this paper are to concisely introduce the extractive-search paradigm; and to demonstrate a prototype neural retrieval system for extractive search and its benefits and potential. Expand
Evolutionary Algorithm Based Summarization for Analyzing COVID-19 Medical Reports
  • Chirantana Mallick, Sunanda Das, Asit Kumar Das
  • Computer Science
  • Understanding COVID-19: The Role of Computational Intelligence
  • 2021
This chapter tries to extract important information about COVID-19 from the available text documents, such as research papers, articles, journals, reports, and other publications, and its performance is evaluated by comparing it with a few of the related state-of-the-art methods. Expand
The Biomaterials Annotator: a system for ontology-based concept annotation of biomaterials text
This work developed a semantic annotator specifically tailored for the biomaterials literature and makes both the corpus and system available to the community to promote future efforts in the field and contribute towards its sustainability. Expand
Use of Formal Ethical Reviews in NLP Literature: Historical Trends and Current Practices
A detailed quantitative and qualitative analysis of the ACL Anthology is conducted, as well as comparing the trends in the field to those of other related disciplines, such as cognitive science, machine learning, data mining, and systems. Expand
COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
A novel and comprehensive knowledge discovery framework, COVID-KG, to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature and exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation. Expand
CSFCube - A Test Collection of Computer Science Research Articles for Faceted Query by Example
This work introduces the task of faceted Query by Example in which users can also specify a finer grained aspect in addition to the input query document, and describes an expert annotated test collection to evaluate models trained to perform this task. Expand


Syntactic Search by Example
A light-weight query language is introduced that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Expand
Exploratory Relation Extraction in Large Text Corpora
This paper proposes and demonstrates Exploratory Relation Extraction, a novel approach to identifying and extracting relations from large text corpora based on user-driven and data-guided incremental exploration and presents an interactive workflow that allows users to build extractors based on entity types and human-readable extraction patterns derived from subtrees in dependency trees. Expand
Attending to All Mention Pairs for Full Abstract Biological Relation Extraction
This work proposes a model to consider all mention and entity pairs simultaneously in order to make a prediction, which achieves the state of the art on the Biocreative V Chemical Disease Relation dataset for models without KB resources, outperforming ensembles of models which use hand-crafted features and additional linguistic resources. Expand
Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
This work introduces the proposed five step workflow for creating information extractors, the graph query based rule language, as well as the core features of the PROPMINER tool. Expand
Odinson: A Fast Rule-based Information Extraction Framework
Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time, to guarantee the rapid matching of patterns. Expand
The NLM Medical Text Indexer System for Indexing Biomedical Literature
An overview of MTI’s functionality, performance, and its evolution over the years is provided. Expand
ExaCT: automatic extraction of clinical trial characteristics from journal publications
An automatic information extraction system that assists users with locating and extracting key trial characteristics from full-text journal articles reporting on randomized controlled trials (RCTs) and can be extended to handle other characteristics and document types. Expand
Cross-Sentence N-ary Relation Extraction with Graph LSTMs
A general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction is explored, demonstrating its effectiveness with both conventional supervised learning and distant supervision. Expand
ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing
ScispaCy, a new Python library and models for practical biomedical/scientific text processing, which heavily leverages the spaCy library is described, which detail the performance of two packages of models released in scispa Cy and demonstrate their robustness on several tasks and datasets. Expand
LIVIVO – the Vertical Search Engine for Life Sciences
Future work will focus on the exploitation of life science ontologies and on the employment of NLP technologies in order to improve query expansion, filters in faceted search, and concept based relevancy rankings in LIVIVO. Expand