Corpus ID: 15015161

Open Information Extraction Using Wikipedia

@inproceedings{Wu2010OpenIE,
  title={Open Information Extraction Using Wikipedia},
  author={Fei Wu and Daniel S. Weld},
  booktitle={ACL},
  year={2010}
}
Information-extraction (IE) systems seek to distill semantic relations from natural-language text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? This paper presents WOE, an open IE system which improves dramatically on TextRunner's precision and… Expand
Open Language Learning for Information Extraction
Open Information Extraction (IE) systems extract relational tuples from text, without requiring a pre-specified vocabulary, by identifying relation phrases and associated arguments in arbitraryExpand
RDR-based open IE for the web document
TLDR
The key advantages of this approach are that it can handle the freer writing style that occurs in Web documents and can correct errors introduced by natural language pre-processing tools, whereas systems like TEXTRUNNER depend on the quality of the entity-tagging preprocessing in the training data. Expand
An analysis of open information extraction based on semantic role labeling
TLDR
This work investigates the use of semantic role labeling techniques for the task of Open IE and compares SRL-based open extractors with TextRunner, an open extractor which uses shallow syntactic analysis but is able to analyze many more sentences in a fixed amount of time and thus exploit corpus-level statistics. Expand
Nested Propositions in Open Information Extraction
TLDR
NESTIE is proposed, which uses a nested representation to extract higher-order relations, and complex, interdependent assertions, and Nesting the extracted propositions allows NESTIE to more accurately reflect the meaning of the original sentence. Expand
Methods for open information extraction and sense disambiguation on natural language text
TLDR
ClausIE is a principled method that relies on properties of the English language and thereby avoids the use of manually or automatically generated training data, and Werdy is an unsupervised approach, mainly relying on the syntactic and semantic relation established between a verb sense and its arguments. Expand
Identifying Relations for Open Information Extraction
TLDR
Two simple syntactic and lexical constraints on binary relations expressed by verbs are introduced in the ReVerb Open IE system, which more than doubles the area under the precision-recall curve relative to previous extractors such as TextRunner and woepos. Expand
Towards Large-Scale Unsupervised Relation Extraction from the Web
TLDR
A novel unsupervised algorithm is presented that provides a more general treatment of the polysemy and synonymy problems and explicitly disambiguates polysemous relation phrases and groups synonymous ones and achieves significant improvement on recall compared to the previous method. Expand
Open Information Extraction: The Second Generation
TLDR
The second generation of Open IE systems are described, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE. Expand
Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
TLDR
A novel approach for multi-instance learning with overlapping relations that combines a sentence-level extraction model with a simple, corpus-level component for aggregating the individual facts is presented. Expand
Open Information Extraction with Tree Kernels
TLDR
An unsupervised rule-based approach which can serve as a strong baseline for Open IE systems and multiple SVM models with dependency tree kernels for both explicit relation extraction and confirm explicit relation words for two entities are proposed. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 37 REFERENCES
Open Information Extraction from the Web
TLDR
Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input, is introduced. Expand
Learning 5000 Relational Extractors
TLDR
LUCHS is presented, a self-supervised, relation-specific IE system which learns 5025 relations --- more than an order of magnitude greater than any previous approach --- with an average F1 score of 61%. Expand
Learning Syntactic Patterns for Automatic Hypernym Discovery
TLDR
This paper presents a new algorithm for automatically learning hypernym (is-a) relations from text, using "dependency path" features extracted from parse trees and introduces a general-purpose formalization and generalization of these patterns. Expand
Distant supervision for relation extraction without labeled data
TLDR
This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size. Expand
Wanderlust : Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns
A great share of applications in modern information technology can benefit from large coverage, machine accessible knowledge bases. However, the bigger part of todays knowledge is provided in theExpand
Learning to Extract Symbolic Knowledge from the World Wide Web
TLDR
The goal of the research described here is to automatically create a computer understandable world wide knowledge base whose content mirrors that of the World Wide Web, and several machine learning algorithms for this task are described. Expand
Exploiting Syntactic and Semantic Information for Relation Extraction from Wikipedia
TLDR
The preliminary results of the experiments strongly support the hyperthesis that using information in higher level of description is better for relation extraction on Wikipedia and show that the proposed method is promising for text understanding. Expand
Information extraction from Wikipedia: moving down the long tail
TLDR
Three novel techniques for increasing recall from Wikipedia's long tail of sparse classes are presented: shrinkage over an automatically-learned subsumption taxonomy, a retraining technique for improving the training data, and supplementing results by extracting from the broader Web. Expand
Open Knowledge Extraction through Compositional Language Processing
TLDR
Evaluation through manual assessment shows that well-formed propositions of reasonable quality, representing general world knowledge, given in a logical form potentially usable for inference, may be extracted in high volume from arbitrary input sentences. Expand
Wikipedia Link Structure and Text Mining for Semantic Relation Extraction
TLDR
A consistent approach of semantic relation extraction from Wikipedia by consisting of three sub-processes highly optimized for Wikipedia mining; 1) fast pre- processing, 2) POS (Part Of Speech) tag tree analysis, and 3) mainstay extraction. Expand
...
1
2
3
4
...