Interactive Extractive Search over Biomedical Corpora

  title={Interactive Extractive Search over Biomedical Corpora},
  author={Hillel Taub-Tabib and Micah Shlain and Shoval Sadde and Dan Lahav and Matan Eyal and Yaara Cohen and Yoav Goldberg},
We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to dependency-based search, we introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by… 

Figures and Tables from this paper

Extractive Search for Analysis of Biomedical Texts

This work presents a two-stage system that creates custom datasets using a powerful mix of keyword and syntactic matching, and then returns lists of related words, which are used in downstream biomedical work.

A Search Engine for Discovery of Scientific Challenges and Directions

A novel task of extraction and search of scientific challenges and directions, to facilitate rapid knowledge discovery on a large corpus of interdisciplinary work relating to the COVID-19 pandemic, ranging from biomedicine to areas such as AI and economics.

CSFCube - A Test Collection of Computer Science Research Articles for Faceted Query by Example

This work introduces the task of faceted Query by Example in which users can also specify a finer grained aspect in addition to the input query document, and describes an expert annotated test collection to evaluate models trained to perform this task.

Text mining approaches for dealing with the rapidly expanding literature on COVID-19

This review discusses the corpora, modeling resources, systems and shared tasks that have been introduced for COVID-19, and lists 39 systems that provide functionality such as search, discovery, visualization and summarization over the CO VID-19 literature.

Neural Extractive Search

The goals of this paper are to concisely introduce the extractive-search paradigm; and to demonstrate a prototype neural retrieval system for extractive search and its benefits and potential.

Hybrid Search based Enhanced Named Entity Annotation Tool

A novel hybrid search-based enhanced annotation tool that provides an easy-to-use GUI and several search modes to accelerate the annotation exercise and provides faster annotation than typical annotators and comparable performance with state-of-the-art tools.

Rapid Knowledgebase Construction and Hypotheses Generation Using Extractive Literature Search

This work presents a methodology and a supporting tool to allow individual researchers or small teams, without background in bio-curation or computer science, to mine the scientific literature and construct ad-hoc, personalized, and literature-anchored knowledgebases, that are tailored around their specific research interests and support their scientific goals.

Past and future uses of text mining in ecology and evolution

Applying computational tools from text mining and NLP will increase the efficiency of data synthesis, improve the reproducibility of literature reviews, formalize analyses of research biases and knowledge gaps, and promote data-driven discovery of patterns across ecology and evolutionary biology.

A Computational Inflection for Scientific Discovery

The confluence of societal and computational trends suggests that computer science is poised to ignite a revolution in the scientific process itself.

Evolutionary Algorithm Based Summarization for Analyzing COVID-19 Medical Reports

This chapter tries to extract important information about COVID-19 from the available text documents, such as research papers, articles, journals, reports, and other publications, and its performance is evaluated by comparing it with a few of the related state-of-the-art methods.



Syntactic Search by Example

A light-weight query language is introduced that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup.

Exploratory Relation Extraction in Large Text Corpora

This paper proposes and demonstrates Exploratory Relation Extraction, a novel approach to identifying and extracting relations from large text corpora based on user-driven and data-guided incremental exploration and presents an interactive workflow that allows users to build extractors based on entity types and human-readable extraction patterns derived from subtrees in dependency trees.

Attending to All Mention Pairs for Full Abstract Biological Relation Extraction

This work proposes a model to consider all mention and entity pairs simultaneously in order to make a prediction, which achieves the state of the art on the Biocreative V Chemical Disease Relation dataset for models without KB resources, outperforming ensembles of models which use hand-crafted features and additional linguistic resources.

pyBART: Evidence-based Syntactic Transformations for IE

This work introduces a broad-coverage, data-driven and linguistically sound set of transformations, that makes event-structure and many lexical relations explicit, and presents pyBART, an easy-to-use open-source Python library for converting English UD trees either to Enhanced UD graphs or to the authors' representation.

Odinson: A Fast Rule-based Information Extraction Framework

Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time, to guarantee the rapid matching of patterns.

The NLM Medical Text Indexer System for Indexing Biomedical Literature

An overview of MTI’s functionality, performance, and its evolution over the years is provided.

ExaCT: automatic extraction of clinical trial characteristics from journal publications

An automatic information extraction system that assists users with locating and extracting key trial characteristics from full-text journal articles reporting on randomized controlled trials (RCTs) and can be extended to handle other characteristics and document types.

Cross-Sentence N-ary Relation Extraction with Graph LSTMs

A general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction is explored, demonstrating its effectiveness with both conventional supervised learning and distant supervision.

ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

ScispaCy, a new Python library and models for practical biomedical/scientific text processing, which heavily leverages the spaCy library is described, which detail the performance of two packages of models released in scispa Cy and demonstrate their robustness on several tasks and datasets.

LIVIVO – the Vertical Search Engine for Life Sciences

Future work will focus on the exploitation of life science ontologies and on the employment of NLP technologies in order to improve query expansion, filters in faceted search, and concept based relevancy rankings in LIVIVO.