Syntactic Search by Example

  title={Syntactic Search by Example},
  author={Micah Shlain and Hillel Taub-Tabib and Shoval Sadde and Yoav Goldberg},
We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs. In contrast to previous attempts to this effect, we introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graph… Expand
Interactive Extractive Search over Biomedical Corpora
A light-weight query language is introduced that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup, allowing for rapid exploration, development and refinement of user queries. Expand
Neural Extractive Search
The goals of this paper are to concisely introduce the extractive-search paradigm; and to demonstrate a prototype neural retrieval system for extractive search and its benefits and potential. Expand
Bootstrapping Relation Extractors using Syntactic Search by Examples
This work proposes a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts and takes advantage of search engines over syntactic-graphs to obtain positive examples by searching for sentences that are syntactically similar to user input examples. Expand
Measuring and Improving Consistency in Pretrained Language Models
The creation of PARAREL, a high-quality resource of cloze-style query English paraphrases, and analysis of the representational spaces of PLMs suggest that they have a poor structure and are currently not suitable for representing knowledge in a robust way. Expand
A Search Engine for Discovery of Scientific Challenges and Directions
Keeping track of scientific challenges, advances and emerging directions is a fundamental part of research. However, researchers face a flood of papers that hinders discovery of important knowledge.Expand
A Search Engine for Discovery of Biomedical Challenges and Directions
The ability to keep track of scientific challenges, advances and emerging directions is a fundamental part of research. However, researchers face a flood of papers that hinders discovery of importantExpand
Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation
Recent works have found evidence of gender bias in models of machine translation and coreference resolution using mostly synthetic diagnostic datasets. While these quantify bias in a controlledExpand
Use of Formal Ethical Reviews in NLP Literature: Historical Trends and Current Practices
A detailed quantitative and qualitative analysis of the ACL Anthology is conducted, as well as comparing the trends in the field to those of other related disciplines, such as cognitive science, machine learning, data mining, and systems. Expand
CSFCube - A Test Collection of Computer Science Research Articles for Faceted Query by Example
This work introduces the task of faceted Query by Example in which users can also specify a finer grained aspect in addition to the input query document, and describes an expert annotated test collection to evaluate models trained to perform this task. Expand
Pynsett: A programmable relation extractor
A programmable relation extraction method for the English language by parsing texts into semantic graphs, ideal for extracting specialized ontologies in a limited collection of documents. Expand


Odinson: A Fast Rule-based Information Extraction Framework
Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time, to guarantee the rapid matching of patterns. Expand
Fangorn: A System for Querying very large Treebanks
An efficient web-based system for querying very large treebanks called Fangorn, which implements an XPath-like query language which is extended with a linguistic operator to capture proximity in the terminal sequence. Expand
Example-Based Treebank Querying
GrETEL (Greedy Extraction of Trees for Empirical Linguistics), a query engine in which linguists can use natural language examples as a starting point for searching the Lassy treebank without knowledge about tree representations nor formal query languages is developed. Expand
Dep_search: Efficient Search Tool for Large Dependency Parsebanks
Improved versions of the syntactic analysis query toolkit, dep search, geared towards morphologically rich languages and large parsebanks are presented, including better data indexing, especially better database backend and document metadata support, API access and improved web user interface. Expand
Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
This work introduces the proposed five step workflow for creating information extractors, the graph query based rule language, as well as the core features of the PROPMINER tool. Expand
The Linguist’s Search Engine: An Overview
The Linguist's Search Engine (LSE) was designed to provide an intuitive, easy-to-use interface that enables language researchers to seek linguistically interesting examples on the Web, based onExpand
Identifying Relations for Open Information Extraction
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos. Expand
Exploratory Relation Extraction in Large Text Corpora
This paper proposes and demonstrates Exploratory Relation Extraction, a novel approach to identifying and extracting relations from large text corpora based on user-driven and data-guided incremental exploration and presents an interactive workflow that allows users to build extractors based on entity types and human-readable extraction patterns derived from subtrees in dependency trees. Expand
A Domain-independent Rule-based Framework for Event Extraction
This work describes the design, development, and API of ODIN (Open Domain INformer), a domainindependent, rule-based event extraction (EE) framework that was used to develop a grammar for the biochemical domain, which approached human performance. Expand
SeRQL: A Second Generation RDF Query Language
This position paper introduces a set of general requirements for an RDF query language, compiled from discussions between RDF implementors, experience and user feedback, and goes on to show how these requirements have been compiled into drafting the SeRQL query language. Expand