Syntactic Search by Example
@inproceedings{Shlain2020SyntacticSB, title={Syntactic Search by Example}, author={Micah Shlain and Hillel Taub-Tabib and Shoval Sadde and Yoav Goldberg}, booktitle={Annual Meeting of the Association for Computational Linguistics}, year={2020} }
We present a system that allows a user to search a large linguistically annotated corpus using syntactic patterns over dependency graphs. In contrast to previous attempts to this effect, we introduce a light-weight query language that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graph…
Figures from this paper
23 Citations
Interactive Extractive Search over Biomedical Corpora
- Computer ScienceBIONLP
- 2020
A light-weight query language is introduced that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup, allowing for rapid exploration, development and refinement of user queries.
Neural Extractive Search
- Computer ScienceACL
- 2021
The goals of this paper are to concisely introduce the extractive-search paradigm; and to demonstrate a prototype neural retrieval system for extractive search and its benefits and potential.
Extracting semantic relations using syntax
- Computer ScienceComputational Communication Research
- 2021
The rsyntax R package is introduced, which is designed to make working with dependency trees easier and more intuitive for R users, and provides a framework for combining multiple rules for reliably extracting useful semantic relations.
Pynsett: A programmable relation extractor
- Computer ScienceArXiv
- 2020
A programmable relation extraction method for the English language by parsing texts into semantic graphs, ideal for extracting specialized ontologies in a limited collection of documents.
Bootstrapping Relation Extractors using Syntactic Search by Examples
- Computer ScienceEACL
- 2021
This work proposes a process for bootstrapping training datasets which can be performed quickly by non-NLP-experts and takes advantage of search engines over syntactic-graphs to obtain positive examples by searching for sentences that are syntactically similar to user input examples.
GrASP: A Library for Extracting and Exploring Human-Interpretable Textual Patterns
- Computer ScienceLREC
- 2022
The library integrates a first public implementation of the existing GrASP algorithm, and allows users to extract patterns using a number of general-purpose built-in linguistic attributes, as envisaged for the original algorithm.
ClozeSearch: A Collocation Retrieval Application to Assist in Scientific Writing
- Computer ScienceProceedings of the 31st ACM International Conference on Information & Knowledge Management
- 2022
This paper presents a slot-filling retrieval application, ClozeSearch, for searching collocates to assist users in scientific writing and proposes two alternatives based on syntactic graph and deep language model for better flexibility in coping with long queries.
Extractive Search for Analysis of Biomedical Texts
- Computer ScienceSIGIR
- 2022
This work presents a two-stage system that creates custom datasets using a powerful mix of keyword and syntactic matching, and then returns lists of related words, which are used in downstream biomedical work.
Large Scale Substitution-based Word Sense Induction
- Computer ScienceACL
- 2022
A word-sense induction method based on pre-trained masked language models (MLMs), which can cheaply scale to large vocabularies and large corpora, and which allows to induce corpora-specific senses, which may not appear in standard sense inventories, is presented.
Measuring and Improving Consistency in Pretrained Language Models
- Computer ScienceTransactions of the Association for Computational Linguistics
- 2021
The creation of PARAREL, a high-quality resource of cloze-style query English paraphrases, and analysis of the representational spaces of PLMs suggest that they have a poor structure and are currently not suitable for representing knowledge in a robust way.
References
SHOWING 1-10 OF 17 REFERENCES
Odinson: A Fast Rule-based Information Extraction Framework
- Computer ScienceLREC
- 2020
Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time, to guarantee the rapid matching of patterns.
Fangorn: A System for Querying very large Treebanks
- Computer ScienceCOLING
- 2012
An efficient web-based system for querying very large treebanks called Fangorn, which implements an XPath-like query language which is extended with a linguistic operator to capture proximity in the terminal sequence.
Example-Based Treebank Querying
- Computer ScienceLREC
- 2012
GrETEL (Greedy Extraction of Trees for Empirical Linguistics), a query engine in which linguists can use natural language examples as a starting point for searching the Lassy treebank without knowledge about tree representations nor formal query languages is developed.
Dep_search: Efficient Search Tool for Large Dependency Parsebanks
- Computer ScienceNODALIDA
- 2017
Improved versions of the syntactic analysis query toolkit, dep search, geared towards morphologically rich languages and large parsebanks are presented, including better data indexing, especially better database backend and document metadata support, API access and improved web user interface.
Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
- Computer ScienceACL
- 2013
This work introduces the proposed five step workflow for creating information extractors, the graph query based rule language, as well as the core features of the PROPMINER tool.
The Linguist’s Search Engine: An Overview
- Computer Science, LinguisticsACL
- 2005
The Linguist's Search Engine (LSE) was designed to provide an intuitive, easy-to-use interface that enables language researchers to seek linguistically interesting examples on the Web, based on…
Identifying Relations for Open Information Extraction
- Computer ScienceEMNLP
- 2011
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos.
Exploratory Relation Extraction in Large Text Corpora
- Computer ScienceCOLING
- 2014
This paper proposes and demonstrates Exploratory Relation Extraction, a novel approach to identifying and extracting relations from large text corpora based on user-driven and data-guided incremental exploration and presents an interactive workflow that allows users to build extractors based on entity types and human-readable extraction patterns derived from subtrees in dependency trees.
A Domain-independent Rule-based Framework for Event Extraction
- Computer ScienceACL
- 2015
This work describes the design, development, and API of ODIN (Open Domain INformer), a domainindependent, rule-based event extraction (EE) framework that was used to develop a grammar for the biochemical domain, which approached human performance.
SeRQL: A Second Generation RDF Query Language
- Computer Science
- 2003
This position paper introduces a set of general requirements for an RDF query language, compiled from discussions between RDF implementors, experience and user feedback, and goes on to show how these requirements have been compiled into drafting the SeRQL query language.