Swanson linking revisited: Accelerating literature-based discovery across domains using a conceptual influence graph

  title={Swanson linking revisited: Accelerating literature-based discovery across domains using a conceptual influence graph},
  author={Gus Hahn-Powell and Marco Antonio Valenzuela-Escarcega and Mihai Surdeanu},
We introduce a modular approach for literature-based discovery consisting of a machine reading and knowledge assembly component that together produce a graph of influence relations (e.g., “A promotes B”) from a collection of publications. [] Key Method A search engine is used to explore direct and indirect influence chains. Query results are substantiated with textual evidence, ranked according to their relevance, and presented in both a table-based view, as well as a network graph visualization.

Figures from this paper

Construction of the Literature Graph in Semantic Scholar

This paper reduces literature graph construction into familiar NLP tasks, point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task.

Enabling Search and Collaborative Assembly of Causal Interactions Extracted from Multilingual and Multi-domain Free Text

A system that incorporates multi-domain extractions of causal interactions into a single searchable knowledge graph enables users to search iteratively over direct and indirect connections in this knowledge graph, and collaboratively build causal models in real time.

Scientific Discovery as Link Prediction in Influence and Citation Graphs

A machine learning approach for the identification of “white spaces” in scientific knowledge is introduced, which predicts which influence links will be discovered in the “near future” with a F1 score of 27 points, and a mean average precision of 68%.

Eidos, INDRA, & Delphi: From Free Text to Executable Causal Models

This paper introduces an approach that builds executable probabilistic models from raw, free text from Eidos, INDRA, and Delphi, an open-domain machine reading system designed to extract causal relations from natural language.

Mining Academic Publications to Predict Automation

This work is unable to link the co-occurrences found in academic publications to automation in the labor force due to a dearth of automation data, but future work conducted when such data is available could apply a similar approach with the aim of predicting automation from trends in academic research and publications.



Extracting Complex Biological Events with Rich Graph-Based Feature Sets

A system for extracting complex events among genes and proteins from biomedical literature, developed in context of the BioNLP'09 Shared Task on Event Extraction, which defines a wide array of features and makes extensive use of dependency parse graphs.

This before That: Causal Precedence in the Biomedical Domain

A novel, hand-annotated text corpus of causal precedence in the biomedical domain is described and a sieve-based architecture is applied to capitalize on this lack of overlap, achieving a micro F1 score of 46 points.

Conceptual biology, hypothesis discovery, and text mining: Swanson's legacy

Innovative biomedical librarians and information specialists who want to expand their roles as expert searchers need to know about profound changes in biology and parallel trends in text mining. In

Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses.

PathwayMatrix: visualizing binary relationships between proteins in biological pathways

This paper introduces PathwayMatrix, a visualization tool that presents the binary relations between proteins in the pathway via the use of an interactive adjacency matrix and provides filtering, lensing, clustering, and brushing and linking capabilities in order to present relevant details about proteins within a pathway.

Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules

The two stages of the sieve-based architecture, a mention detection stage that heavily favors recall, followed by coreference sieves that are precision-oriented, offer a powerful way to achieve both high precision and high recall.

Sieve-based Coreference Resolution in the Biomedical Domain

A rule-based architecture that uses sequentially applied hand-designed "sieves", with the output of each sieve informing and constraining subsequent sieves, that provides a 3.2% increase in throughput to the Reach event extraction system with precision parallel to that of the stricter system that relies solely on syntactic patterns for extraction.

A Machine Reading System for Assembling Synthetic Paleontological Databases

The quality of a machine reading system that automatically locates and extracts data from heterogeneous text, tables, and figures in publications is developed and validated and it is shown that the system can readily accommodate sophisticated data types, such as morphological data in biological illustrations and associated textual descriptions.

Undiscovered Public Knowledge

Knowledge can be public, yet undiscovered, if independently created fragments are logically related but never retrieved, brought together, and interpreted. Information retrieval, although essential

Toward an Architecture for Never-Ending Language Learning

This work proposes an approach and a set of design principles for an intelligent computer agent that runs forever and describes a partial implementation of such a system that has already learned to extract a knowledge base containing over 242,000 beliefs.