Leveraging Linguistic Structure For Open Domain Information Extraction

  title={Leveraging Linguistic Structure For Open Domain Information Extraction},
  author={Gabor Angeli and Melvin Johnson and Christopher D. Manning},
Relation triples produced by open domain information extraction (open IE) systems are useful for question answering, inference, and other IE tasks. Traditionally these are extracted using a large set of patterns; however, this approach is brittle on out-of-domain text and long-range dependencies, and gives no insight into the substructure of the arguments. We replace this large pattern set with a few patterns for canonically structured sentences, and shift the focus to a classifier which learns… 

Figures and Tables from this paper

Open Relation Extraction and Grounding
This work proposes a novel importance-based open RE approach by exploiting the global structure of a dependency tree to extract salient triples from large-scale corpora by leveraging KB triples and weighted context words associated with relational triples.
Open Information Extraction from Question-Answer Pairs
NeurON is described, a system for extracting tuples from question-answer pairs that combines distributed representations of a question and an answer to generate knowledge facts and is described on two real-world datasets that demonstrate that NeurON can find a significant number of new and interesting facts to extend a knowledge base compared to state-of-the-art OpenIE methods.
On the Limits of Aligning OpenIE Extractions with Knowledge Bases
  • Computer Science
  • 2020
This study investigates how OpenIE extractions are related to KBs w.r.t. information content, and suggests that significant part of such specific OpenIE triples can be expressed by using KB formulas.
Transformer based network for Open Information Extraction
Zero-Shot Open Information Extraction using Question Generation and Reading Comprehension
This paper presents a zero-shot open information extraction technique that extracts the entities (value) and their descriptions (key) from a sentence, using off the shelf machine reading comprehension (MRC) Model.
MinIE: Minimizing Facts in Open Information Extraction
An experimental study with several real-world datasets found that MinIE achieves competitive or higher precision and recall than most prior systems, while at the same time producing shorter, semantically enriched extractions.
Revisiting the Task of Scoring Open IE Relations
A simple baseline is proposed, based on language modeling and trained with off-the-shelf programs, which gives competitive results in the previously defined protocol for this task, and provides an independent source of signal to judge arbitrary fact plausibility.
QA4IE: A Question Answering based Framework for Information Extraction
A novel IE framework named QA4IE is proposed, which leverages the flexible question answering (QA) approaches to produce high quality relation triples across sentences to overcome weaknesses in common IE solutions.
On Aligning OpenIE Extractions with Knowledge Bases: A Case Study
This paper directly evaluates how OIE triples from the OPIEC corpus are related to the DBpedia KB w.r.t. information content and suggests that significant part of Oie triples can be expressed by means of KB formulas instead of individual facts.
Supervised Open Information Extraction
A novel formulation of Open IE as a sequence tagging problem, addressing challenges such as encoding multiple extractions for a predicate, and a supervised model that outperforms the existing state-of-the-art Open IE systems on benchmark datasets.


Adapting Open Information Extraction to Domain-Specific Relations
The steps needed to adapt Open IE to a domain-specific ontology are explored and the approach of mapping domain-independent tuples to an ontology using domains from DARPA’s Machine Reading Project is demonstrated.
Open Language Learning for Information Extraction
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary
Identifying Relations for Open Information Extraction
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos.
Open question answering over curated and extracted knowledge bases
This paper presents OQA, the first approach to leverage both curated and extracted KBs, and demonstrates that it achieves up to twice the precision and recall of a state-of-the-art Open QA system.
Open Information Extraction Using Wikipedia
WOE is presented, an open IE system which improves dramatically on TextRunner's precision and recall and is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data.
Open Information Extraction to KBP Relations in 3 Hours
We participated in both the English Slot Filling and Entity Linking in the 2013 TAC-KBP evaluation. Our Slot Filling system provides an answer to the following conjectures: Can Open Information
Effectiveness and Efficiency of Open Relation Extraction
A fair and objective experimental comparison of 8 state-of-the-art approaches over 5 different datasets is presented, and sheds some light on the tradeoff between NLP depth and effectiveness.
Distant supervision for relation extraction without labeled data
This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.
Combining Distant and Partial Supervision for Relation Extraction
This work presents an approach for providing partial supervision to a distantly supervised relation extractor using a small number of carefully selected examples, and proposes a novel criterion to sample examples which are both uncertain and representative.
Learning text analysis rules for domain-specific natural language processing
This thesis presents CRYSTAL, an implemented system that automatically induces domain-specific text analysis rules from training examples that approach the performance of hand-coded rules, are robust in the face of noise and inadequate features, and require only a modest amount of training data.