Syntactic Search by Example

@inproceedings{Shlain2020SyntacticSB,
  title={Syntactic Search by Example},
  author={Micah Shlain and Hillel Taub-Tabib and Shoval Sadde and Yoav Goldberg},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2020}
}
We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs. In contrast to previous attempts to this effect, we introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graph… 

Figures from this paper

Interactive Extractive Search over Biomedical Corpora

A light-weight query language is introduced that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup, allowing for rapid exploration, development and refinement of user queries.

Neural Extractive Search

The goals of this paper are to concisely introduce the extractive-search paradigm; and to demonstrate a prototype neural retrieval system for extractive search and its benefits and potential.

Extracting semantic relations using syntax

The rsyntax R package is introduced, which is designed to make working with dependency trees easier and more intuitive for R users, and provides a framework for combining multiple rules for reliably extracting useful semantic relations.

Pynsett: A programmable relation extractor

A programmable relation extraction method for the English language by parsing texts into semantic graphs, ideal for extracting specialized ontologies in a limited collection of documents.

Bootstrapping Relation Extractors using Syntactic Search by Examples

This work proposes a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts and takes advantage of search engines over syntactic-graphs to obtain positive examples by searching for sentences that are syntactically similar to user input examples.

GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns

The library integrates a first public implementation of the existing GrASP algorithm, and allows users to extract patterns using a number of general-purpose built-in linguistic attributes, as envisaged for the original algorithm.

ClozeSearch: A Collocation Retrieval Application to Assist in Scientific Writing

  • Mengru WangOmar Alonso
  • Computer Science
    Proceedings of the 31st ACM International Conference on Information & Knowledge Management
  • 2022
This paper presents a slot-filling retrieval application, ClozeSearch, for searching collocates to assist users in scientific writing and proposes two alternatives based on syntactic graph and deep language model for better flexibility in coping with long queries.

Extractive Search for Analysis of Biomedical Texts

This work presents a two-stage system that creates custom datasets using a powerful mix of keyword and syntactic matching, and then returns lists of related words, which are used in downstream biomedical work.

Large Scale Substitution-based Word Sense Induction

A word-sense induction method based on pre-trained masked language models (MLMs), which can cheaply scale to large vocabularies and large corpora, and which allows to induce corpora-specific senses, which may not appear in standard sense inventories, is presented.

Measuring and Improving Consistency in Pretrained Language Models

The creation of PARAREL, a high-quality resource of cloze-style query English paraphrases, and analysis of the representational spaces of PLMs suggest that they have a poor structure and are currently not suitable for representing knowledge in a robust way.

References

SHOWING 1-10 OF 17 REFERENCES

Odinson: A Fast Rule-based Information Extraction Framework

Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time, to guarantee the rapid matching of patterns.

Fangorn: A System for Querying very large Treebanks

An efficient web-based system for querying very large treebanks called Fangorn, which implements an XPath-like query language which is extended with a linguistic operator to capture proximity in the terminal sequence.

Example-Based Treebank Querying

GrETEL (Greedy Extraction of Trees for Empirical Linguistics), a query engine in which linguists can use natural language examples as a starting point for searching the Lassy treebank without knowledge about tree representations nor formal query languages is developed.

Dep_search: Efficient Search Tool for Large Dependency Parsebanks

Improved versions of the syntactic analysis query toolkit, dep search, geared towards morphologically rich languages and large parsebanks are presented, including better data indexing, especially better database backend and document metadata support, API access and improved web user interface.

Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees

This work introduces the proposed five step workflow for creating information extractors, the graph query based rule language, as well as the core features of the PROPMINER tool.

The Linguist’s Search Engine: An Overview

The Linguist's Search Engine (LSE) was designed to provide an intuitive, easy-to-use interface that enables language researchers to seek linguistically interesting examples on the Web, based on

Identifying Relations for Open Information Extraction

Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos.

Exploratory Relation Extraction in Large Text Corpora

This paper proposes and demonstrates Exploratory Relation Extraction, a novel approach to identifying and extracting relations from large text corpora based on user-driven and data-guided incremental exploration and presents an interactive workflow that allows users to build extractors based on entity types and human-readable extraction patterns derived from subtrees in dependency trees.

A Domain-independent Rule-based Framework for Event Extraction

This work describes the design, development, and API of ODIN (Open Domain INformer), a domainindependent, rule-based event extraction (EE) framework that was used to develop a grammar for the biochemical domain, which approached human performance.

SeRQL: A Second Generation RDF Query Language

This position paper introduces a set of general requirements for an RDF query language, compiled from discussions between RDF implementors, experience and user feedback, and goes on to show how these requirements have been compiled into drafting the SeRQL query language.