Phrase-based clause extraction for open information extraction system
Information Extraction is an important task in Natural Language Processing, consisting of finding a structured representation for the information expressed in natural language text. Two key steps in information extraction are identifying the entities mentioned in the text, and the relations among those entities. In the context of Information Extraction for the World Wide Web, unsupervised relation extraction methods, also called Open Relation Extraction (ORE) systems, have become prevalent, due to their effectiveness without domain-specific training data. In general, these systems exploit part-of-speech tags or semantic information from the sentences to determine whether or not a relation exists, and if so, its predicate. This paper discusses some of the issues that arise when even moderately complex sentences are fed into ORE systems. A process for re-structuring such sentences is discussed and evaluated. The proposed approach replaces complex sentences by several others that, together, convey the same meaning and are more amenable to extraction by current ORE systems. The results of an experimental evaluation show that this approach succeeds in reducing the processing time and increasing the accuracy of the state-of-the-art ORE systems.