Open Information Extraction: The Second Generation

@inproceedings{Etzioni2011OpenIE,
  title={Open Information Extraction: The Second Generation},
  author={Oren Etzioni and Anthony Fader and Janara Christensen and Stephen Soderland and Mausam},
  booktitle={IJCAI},
  year={2011}
}
How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web. In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews hand-labeled training examples, and avoids domain-specific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted… Expand
Multilingual Open Information Extraction
TLDR
A multilingual rule-based OIE method that takes as input dependency Parses in the CoNLL-X format, identifies argument structures within the dependency parses, and extracts a set of basic propositions from each argument structure, which obtains higher recall and higher precision than existing approaches relying on training data. Expand
Open Language Learning for Information Extraction
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitraryExpand
Open Information Extraction
TLDR
This paper describes an overview of two Open IE generations including strengths, weaknesses and application areas and exposes simple yet principled ways in which verbs express relationships in linguistics such as verb phrase- based extraction or clause-based extraction. Expand
Dependency-Based Open Information Extraction
TLDR
A new multilingual OIE system based on robust and fast rule-based dependency parsing that permits to extract more precise assertions from text than state of the art OIE systems, keeping a crucial property of those systems: scaling to Web-size document collections. Expand
Open Information Extraction Systems and Downstream Applications
  • Mausam
  • Computer Science
  • IJCAI
  • 2016
TLDR
A decade of progress on building Open IE extractors is described, which results in the latest extractor, OPENIE4, which is computationally efficient, outputs n-ary and nested relations, and also outputs relations mediated by nouns in addition to verbs. Expand
Canonicalizing Open Knowledge Bases
TLDR
This paper presents an approach based on machine learning methods that can canonicalize such Open IE triples, by clustering synonymous names and phrases, thus shedding light on the middle ground between "open" and "closed" information extraction systems. Expand
Out of Many , One : Unifying Web-Extracted Knowledge Bases
Extracting knowledge from large text corpora and the world wide web is an important problem in artificial intelligence. Arguably, the majority of the world’s knowledge is contained in naturalExpand
Extraction Systems and Downstream Applications
Open Information Extraction (Open IE) extracts textual tuples comprising relation phrases and argument phrases from within a sentence, without requiring a pre-specified relation vocabulary. In thisExpand
Parser Extraction of Triples in Unstructured Text
TLDR
A depth-first search traversal on the POS tagged syntactic tree appending predicate and object information and the architecture of a language compiler for processing subject-predicate-object triples using the OpenNLP parser are defined. Expand
Open Information Extraction Based on Lexical-Syntactic Patterns
TLDR
A novel Open IE approach that performs unsupervised extraction of triples by applying a few lexical-syntactic patterns to POS-tagged texts is described, overcoming those from the state-of-the-art systems. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 43 REFERENCES
Identifying Relations for Open Information Extraction
TLDR
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos. Expand
Open Information Extraction Using Wikipedia
TLDR
WOE is presented, an open IE system which improves dramatically on TextRunner's precision and recall and is a novel form of self-supervised learning for open extractors -- using heuristic matches between Wikipedia infobox attribute values and corresponding sentences to construct training data. Expand
Open Information Extraction from the Web
TLDR
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced. Expand
Adapting Open Information Extraction to Domain-Specific Relations
TLDR
The steps needed to adapt Open IE to a domain-specific ontology are explored and the approach of mapping domain-independent tuples to an ontology using domains from DARPA’s Machine Reading Project is demonstrated. Expand
The Tradeoffs Between Open and Traditional Relation Extraction
TLDR
A new model for Open IE called O-CRF is presented and it is shown that it achieves increased precision and nearly double the recall than the model employed by TEXTRUNNER, the previous stateof-the-art Open IE system. Expand
Learning 5000 Relational Extractors
TLDR
LUCHS is presented, a self-supervised, relation-specific IE system which learns 5025 relations --- more than an order of magnitude greater than any previous approach --- with an average F1 score of 61%. Expand
Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
TLDR
A novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts is presented. Expand
Learning Information Extraction Rules for Semi-Structured and Free Text
TLDR
WHISK is designed to handle text styles ranging from highly structured to free text, including text that is neither rigidly formatted nor composed of grammatical sentences, and can also handle extraction from free text such as news stories. Expand
Unsupervised Methods for Determining Object and Relation Synonyms on the Web
TLDR
This paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K, and introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. Expand
Identifying Functional Relations in Web Text
TLDR
Leibniz is utilized to generate the first public repository of automatically-identified functional relations, exploiting the synergy between the Web corpus and freely-available knowledge resources such as Free-base to solve the challenge of determining whether a textual phrase denotes a functional relation. Expand
...
1
2
3
4
5
...