Triplifying Wikipedia's Tables


We are currently investigating methods to triplify the content of Wikipedia’s tables. We propose that existing knowledge-bases can be leveraged to semi-automatically extract high-quality facts (in the form of RDF triples) from tables embedded in Wikipedia articles (henceforth called “Wikitables”). We present a survey of Wikitables and their content in a recent dump of Wikipedia. We then discuss some ongoing work on using DBpedia to mine novel RDF triples from these tables: we present methods that automatically extract 24.4 million raw triples from the Wikitables at an estimated precision of 52.2%. We believe this precision can be (greatly) improved through machine learning methods and sketch ideas for features that should help classify (in)correct triples.

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@inproceedings{Muoz2013TriplifyingWT, title={Triplifying Wikipedia's Tables}, author={Emir Mu{\~n}oz and Aidan Hogan and Alessandra Mileo}, booktitle={LD4IE@ISWC}, year={2013} }