Katrin Braunschweig

Learn More
Following the Open Data trend, governments and public agencies have started making their data available on the Web and established platforms such as data.gov or data.un.org. These Open Data platforms provide a huge amount of data for various topics such as demographics, transport, finance or health in various data formats. One typical usage scenario for(More)
Entity augmentation is a query type in which, given a set of entities and a large corpus of possible data sources, the values of a missing attribute are to be retrieved. State of the art methods return a single result that, to cover all queried entities, is fused from a potentially large set of data sources. We argue that queries on large corpora of(More)
Platforms for publication and collaborative management of data, such as <i>Data.gov</i> or <i>Google Fusion Tables</i>, are a new trend on the web. They manage very large corpora of datasets, but often lack an integrated schema, ontology, or even just common publication standards. This results in inconsistent names for attributes of the same meaning, which(More)
Government initiatives for more transparency and participation have lead to an increasing amount of structured data on the web in recent years. Many of these datasets have great potential. For example, a situational analysis and meaningful visualization of the data can assist in pointing out social or economic issues and raising people’s awareness.(More)
In a traditional relational database management system, queries can only be defined over attributes defined in the schema, but are guaranteed to give single, definitive answer structured exactly as specified in the query. In contrast, an information retrieval system allows the user to pose queries without knowledge of a schema, but the result will be a(More)
Relational Web tables have become an important resource for applications such as factual search and entity augmentation. A major challenge for an automatic identification of relevant tables on the Web is the fact that many of these tables have missing or non-informative column labels. Research has focused largely on recovering the meaning of columns by(More)
Named entitye xtraction is an established researcha rea in the field of information extraction. When tailored to as pecific domain and with sufficientp re-labeled training data, state-of-the-art extraction algorithms have achieved near human performance. However, when presented with semi-structured data, informal text or unknown domains where training data(More)