Creating a Large Benchmark for Open Information Extraction

@inproceedings{Stanovsky2016CreatingAL,
  title={Creating a Large Benchmark for Open Information Extraction},
  author={Gabriel Stanovsky and Ido Dagan},
  booktitle={EMNLP},
  year={2016}
}
Open information extraction (Open IE) was presented as an unrestricted variant of traditional information extraction. It has been gaining substantial attention, manifested by a large number of automatic Open IE extractors and downstream applications. In spite of this broad attention, the Open IE task definition has been lacking – there are no formal guidelines and no large scale gold standard annotation. Subsequently, the various implementations of Open IE resorted to small scale posthoc… 

Figures and Tables from this paper

Neural Open Information Extraction
TLDR
A neural Open IE approach with an encoder-decoder framework that learns highly confident arguments and relation tuples bootstrapped from a state-of-the-art Open IE system.
Supervised Open Information Extraction
TLDR
A novel formulation of Open IE as a sequence tagging problem, addressing challenges such as encoding multiple extractions for a predicate, and a supervised model that outperforms the existing state-of-the-art Open IE systems on benchmark datasets.
WiRe57 : A Fine-Grained Benchmark for Open Information Extraction
TLDR
The non-trivial problem of evaluating the extractions produced by systems against the reference tuples is addressed, and the MinIE system is found to perform best.
MinIE: Minimizing Facts in Open Information Extraction
TLDR
An experimental study with several real-world datasets found that MinIE achieves competitive or higher precision and recall than most prior systems, while at the same time producing shorter, semantically enriched extractions.
Analysing Errors of Open Information Extraction Systems
TLDR
This comprehensive benchmark contains three data sets from the news domain and one data set from Wikipedia with overall 4522 labeled sentences and 11243 binary or n-ary OIE relations and compares the performance of four popular OIE systems.
Transformer based network for Open Information Extraction
Improving Open Information Extraction via Iterative Rank-Aware Learning
TLDR
This work finds that the extraction likelihood, a confidence measure used by current supervised open IE systems, is not well calibrated when comparing the quality of assertions extracted from different sentences, and proposes an additional binary classification loss to calibrate the likelihood to make it more globally comparable.
Improving Open Information Extraction with Distant Supervision Learning
TLDR
A distant supervision learning approach is employed to improve the Open IE task by employing two popular sequence-to-sequence models (RNN and Transformer) and a large benchmark data set to demonstrate the performance of this approach.
Towards a gold standard dataset for Open Information Extraction in Italian
TLDR
This work describes the creation of the first gold standard dataset for the validation of OIE approaches in Italian, manually built on the basis of solid linguistic foundations and used for testing an OIE application for the Italian language.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 20 REFERENCES
Identifying Relations for Open Information Extraction
TLDR
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos.
KrakeN: N-ary Facts in Open Information Extraction
TLDR
KrakeN is an OIE system specifically designed to capture N-ary facts, as well as the results of an experimental study on extracting facts from Web text in which the issue of fact completeness is examined.
Leveraging Linguistic Structure For Open Domain Information Extraction
TLDR
This work replaces this large pattern set with a few patterns for canonically structured sentences, and shifts the focus to a classifier which learns to extract self-contained clauses from longer sentences to determine the maximally specific arguments for each candidate triple.
Open Information Extraction from the Web
TLDR
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced.
Open Language Learning for Information Extraction
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary
Open Information Extraction Using Wikipedia
TLDR
WOE is presented, an open IE system which improves dramatically on TextRunner's precision and recall and is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data.
Information extraction from Wikipedia: moving down the long tail
TLDR
Three novel techniques for increasing recall from Wikipedia's long tail of sparse classes are presented: shrinkage over an automatically-learned subsumption taxonomy, a retraining technique for improving the training data, and supplementing results by extracting from the broader Web.
ClausIE: clause-based open information extraction
TLDR
ClausIE is a novel, clause-based approach to open information extraction, which extracts relations and their arguments from natural language text using a small set of domain-independent lexica, operates sentence by sentence without any post-processing, and requires no training data.
Specifying and Annotating Reduced Argument Span Via QA-SRL
TLDR
A generic argument reduction criterion is proposed, along with an annotation procedure, and it is shown that it can be consistently and intuitively annotated using the recent QA-SRL paradigm.
Open IE as an Intermediate Structure for Semantic Tasks
TLDR
This paper studies Open Information Extraction’s (Open IE) output as an additional intermediate structure and finds that for tasks such as text comprehension, word similarity and word analogy it can be very effective.
...
1
2
...