Neural Extractive Search

@article{Ravfogel2021NeuralES,
  title={Neural Extractive Search},
  author={Shaul Ravfogel and Hillel Taub-Tabib and Yoav Goldberg},
  journal={ArXiv},
  year={2021},
  volume={abs/2106.04612}
}
Domain experts often need to extract structured information from large corpora. We advocate for a search paradigm called “extractive search”, in which a search query is enriched with capture-slots, to allow for such rapid extraction. Such an extractive search system can be built around syntactic structures, resulting in high-precision, low-recall results. We show how the recall can be improved using neural retrieval and alignment. The goals of this paper are to concisely introduce the… Expand

Figures and Tables from this paper

References

SHOWING 1-10 OF 14 REFERENCES
Syntactic Search by Example
TLDR
A light-weight query language is introduced that does not require the user to know the details of the underlying syntactic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Expand
Interactive Extractive Search over Biomedical Corpora
TLDR
A light-weight query language is introduced that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup, allowing for rapid exploration, development and refinement of user queries. Expand
Extreme Extraction: Only One Hour per Relation
TLDR
A novel system is presented, InstaRead, that streamlines authoring with an ensemble of methods: encoding extraction rules in an expressive and compositional representation, guiding the user to promising rules based on corpus statistics and mined resources, and introducing a new interactive development cycle that provides immediate feedback --- even on large datasets. Expand
Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
TLDR
This work introduces the proposed five step workflow for creating information extractors, the graph query based rule language, as well as the core features of the PROPMINER tool. Expand
SciBERT: A Pretrained Language Model for Scientific Text
TLDR
SciBERT leverages unsupervised pretraining on a large multi-domain corpus of scientific publications to improve performance on downstream scientific NLP tasks and demonstrates statistically significant improvements over BERT. Expand
A Deep Look into Neural Ranking Models for Information Retrieval
TLDR
A deep look into the neural ranking models from different dimensions is taken to analyze their underlying assumptions, major design principles, and learning strategies to obtain a comprehensive empirical understanding of the existing techniques. Expand
Odinson: A Fast Rule-based Information Extraction Framework
TLDR
Odinson, a rule-based information extraction framework, which couples a simple yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time, to guarantee the rapid matching of patterns. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Large Scale Online Learning of Image Similarity Through Ranking
TLDR
OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost, which suggests that query independent similarity could be accurately learned even for large scale data sets that could not be handled before. Expand
Billion-Scale Similarity Search with GPUs
TLDR
This paper proposes a novel design for an inline-formula that enables the construction of a high accuracy, brute-force, approximate and compressed-domain search based on product quantization, and applies it in different similarity search scenarios. Expand
...
1
2
...