Triplifying Wikipedia's Tables

Abstract

We are currently investigating methods to triplify the content of Wikipedia's tables. We propose that existing knowledge-bases can be leveraged to semi-automatically extract high-quality facts (in the form of RDF triples) from tables embedded in Wikipedia articles (henceforth called " Wikitables "). We present a survey of Wikitables and their content in a recent dump of Wikipedia. We then discuss some ongoing work on using DBpedia to mine novel RDF triples from these tables: we present methods that automatically extract 24.4 million raw triples from the Wikitables at an estimated precision of 52.2%. We believe this precision can be (greatly) improved through machine learning methods and sketch ideas for features that should help classify (in)correct triples.

Extracted Key Phrases

4 Figures and Tables