• Corpus ID: 74065

Open Language Learning for Information Extraction

  title={Open Language Learning for Information Extraction},
  author={Mausam and Michael Schmitz and Stephen Soderland and Robert Bart and Oren Etzioni},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitrary sentences. However, state-of-the-art Open IE systems such as ReVerb and woe share two important weaknesses -- (1) they extract only relations that are mediated by verbs, and (2) they ignore context, thus extracting tuples that are not asserted as factual. This paper presents ollie, a substantially… 

Figures from this paper

Open Information Extraction Systems and Downstream Applications

  • Mausam
  • Computer Science
  • 2016
A decade of progress on building Open IE extractors is described, which results in the latest extractor, OPENIE4, which is computationally efficient, outputs n-ary and nested relations, and also outputs relations mediated by nouns in addition to verbs.

Extraction Systems and Downstream Applications

A decade of progress on building Open IE extractors is described, which results in the latest extractor, OPENIE4, which is computationally efficient, outputs n-ary and nested relations, and also outputs relations mediated by nouns in addition to verbs.

Open Information Extraction

This paper describes an overview of two Open IE generations including strengths, weaknesses and application areas and exposes simple yet principled ways in which verbs express relationships in linguistics such as verb phrase- based extraction or clause-based extraction.

Nested Propositions in Open Information Extraction

NESTIE is proposed, which uses a nested representation to extract higher-order relations, and complex, interdependent assertions, and Nesting the extracted propositions allows NESTIE to more accurately reflect the meaning of the original sentence.

Open Information Extraction with Global Structure Constraints

A novel open IE system, called ReMine, is proposed, which integrates local context signal and global structural signal in a unified framework with distant supervision and can effectively score sentence-level tuple extractions based on corpus-level statistics.

Pattern Learning for Chinese Open Information Extraction

PLCOIE can extract binary relation triples as well as N-ary relation tuples, and experiments show that the results are more precise than state-of-the-art Chinese OIE systems, which indicate that P LCOIE is feasible and effective.

Integrating Local Context and Global Cohesiveness for Open Information Extraction

This paper proposes a novel Open IE system, called ReMine, which integrates local context signals and global structural signals in a unified, distant-supervision framework that can be applied to many different domains to facilitate sentence-level tuple extractions using corpus-level statistics.

A Language Model for Extracting Implicit Relations

IMPLIE (Implicit relation Information Extraction) is presented, that uses an open-domain syntactic language model and user-supplied semantic taggers to overcome this limitation of implicit relations.

Leveraging Linguistic Structure For Open Domain Information Extraction

This work replaces this large pattern set with a few patterns for canonically structured sentences, and shifts the focus to a classifier which learns to extract self-contained clauses from longer sentences to determine the maximally specific arguments for each candidate triple.

Boosting Open Information Extraction with Noun-Based Relations

This work presents a novel Open IE approach that extracts relations expressed in noun compounds, such as (oil, extracted from, olive) from “olive oil”, or in adjective-noun pairs (ANs),such as (moon, that is, gorgeous) from“gorgeous moon”.



Identifying Relations for Open Information Extraction

Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos.

Open Information Extraction Using Wikipedia

WOE is presented, an open IE system which improves dramatically on TextRunner's precision and recall and is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data.

Open Information Extraction from the Web

Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced.

Open Information Extraction: The Second Generation

The second generation of Open IE systems are described, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.

An analysis of open information extraction based on semantic role labeling

This work investigates the use of semantic role labeling techniques for the task of Open IE and compares SRL-based open extractors with TextRunner, an open extractor which uses shallow syntactic analysis but is able to analyze many more sentences in a fixed amount of time and thus exploit corpus-level statistics.

Learning 5000 Relational Extractors

LUCHS is presented, a self-supervised, relation-specific IE system which learns 5025 relations --- more than an order of magnitude greater than any previous approach --- with an average F1 score of 61%.

Distant supervision for relation extraction without labeled data

This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.

Combining linguistic and statistical analysis to extract relations from web documents

It is shown that this approach profits significantly when deep linguistic structures are used instead of surface text patterns, and the benefits of this approach are shown by extensive experiments with the prototype system LEILA.

Snowball: extracting relations from large plain-text collections

This paper develops a scalable evaluation methodology and metrics for the task, and presents a thorough experimental evaluation of Snowball and comparable techniques over a collection of more than 300,000 newspaper documents.

PORE: Positive-Only Relation Extraction from Wikipedia Text

The experimental results show that B-POL can work effectively given only a small amount of positive training examples and it significantly out-performs the original positive learning approaches and a multi-class SVM.