This paper describes a system for entity extraction from the web. The system uses three different extraction techniques which are tightly coupled with mechanisms for retrieving entity rich web pages. The main contributions of this paper are a new entity retrieval approach, a comparison of different extraction techniques and a more precise entity extraction… (More)
The COMCAD Working Paper Series is intended to aid the rapid distribution of work in progress, research findings and special lectures by researchers and associates of COMCAD. Papers aim to stimulate discussion among the worldwide community of scholars, policymakers and practitioners. They are distributed free of charge in PDF format via the COMCAD website.… (More)
In this paper, we introduce an approach for training a Named Entity Recognizer (NER) from a set of seed entities on the web. Creating training data for NERs is tedious, time consuming, and becomes more difficult with a growing set of entity types that should be learned and recognized. Named Entity Recognition is a building block in natural language… (More)
Web feeds are a popular way to access updates for content in the World Wide Web. Unfortunately, the technology behind web feeds is based on polling. Thus, clients ask the feed server regularly for updates. There are two concurrent problems with this approach. First, many times a client asks for updates, there is no new item and second, if the client's… (More)
The paper describes and evaluates a system for extracting knowledge from the web that uses a domain independent fact extraction approach and a self supervised learning algorithm. Using a trust algorithm, the precision of the system is improved to over 70% compared with a baseline of 52%.
Web feeds allow users to retrieve new content from pages on the World Wide Web. Feeds are offered by a multitude of web pages, ranging from conventional news sites to pages with user generated content such as wikis, forums or personal blogs. They notify interested readers of new content and are therefore interesting for information retrieval tasks.… (More)
To experiment properly, scientists from many research areas need large sets of real world data. Information retrieval scientists for example often need to evaluate their algorithms on a dataset or a gold standard. The availability of these datasets often is insufficient and authors with the same goal do not evaluate their approaches on the same data. To… (More)
The paper describes concepts and the realisation of vision teachlets for an interactive visualisation of algorithms in the field of photogrammetry and image analysis. A series of web-based teachlets is being developed, which allow students to learn photogrammetric techniques without any temporal and spatial limitations. The teachlets are meant as an… (More)