A Large Public Corpus of Web Tables containing Time and Context Metadata

  title={A Large Public Corpus of Web Tables containing Time and Context Metadata},
  author={Oliver Lehmberg and Dominique Ritze and Robert Meusel and C. Bizer},
  • Oliver Lehmberg, Dominique Ritze, +1 author C. Bizer
  • Published in WWW 2016
  • Computer Science
  • The Web contains vast amounts of HTML tables. [...] Key Result However, comparing the performance of the different systems is difficult as up till now each system is evaluated using a different corpus of Web tables and as most of the corpora are owned by large search engine companies and are thus not accessible to the public. In this poster, we present a large public corpus of Web tables which contains over 233 million tables and has been extracted from the July 2015 version of the CommonCrawl. By publishing…Expand Abstract

    Figures, Tables, and Topics from this paper.

    Explore Further: Topics Discussed in This Paper

    Matching Web Tables To DBpedia - A Feature Utility Study
    • 38
    • PDF
    Web-Scale Web Table to Knowledge Base Matching
    • 5
    • PDF
    Synthesizing N-ary Relations from Web Tables
    • 4
    Stitching Web Tables for Improving Matching Quality
    • 22
    • PDF
    Fusing time-dependent web table data
    • 6
    • PDF
    Generating Titles for Web Tables
    • 9
    Novel Entity Discovery from Web Tables
    • 4
    • Highly Influenced
    • PDF
    ColNet: Embedding the Semantics of Web Tables for Column Type Prediction
    • 21
    • Highly Influenced
    • PDF


    Publications referenced by this paper.
    Uncovering the Relational Web
    • 149
    • PDF
    Matching HTML Tables to DBpedia
    • 97
    • PDF
    InfoGather: entity augmentation and attribute discovery by holistic matching with web tables
    • 184
    • PDF
    InfoGather+: semantic matching and annotation of numeric and time-varying attributes in web tables
    • 79
    • PDF
    Top-k entity augmentation using consistent set covering
    • 19
    The Mannheim Search Join Engine
    • 48
    DataXFormer: An Interactive Data Transformation Tool
    • 18
    • PDF