This paper describes a system for entity extraction from the web. The system uses three different extraction techniques which are tightly coupled with mechanisms for retrieving entity rich web pages. The main contributions of this paper are a new entity retrieval approach, a comparison of different extraction techniques and a more precise entity extraction… (More)
In this paper, we introduce an approach for training a Named Entity Recognizer (NER) from a set of seed entities on the web. Creating training data for NERs is tedious, time consuming, and becomes more difficult with a growing set of entity types that should be learned and recognized. Named Entity Recognition is a building block in natural language… (More)
Web feeds are a popular way to access updates for content in the World Wide Web. Unfortunately, the technology behind web feeds is based on polling. Thus, clients ask the feed server regularly for updates. There are two concurrent problems with this approach. First, many times a client asks for updates, there is no new item and second, if the client's… (More)
The paper describes and evaluates a system for extracting knowledge from the web that uses a domain independent fact extraction approach and a self supervised learning algorithm. Using a trust algorithm, the precision of the system is improved to over 70% compared with a baseline of 52%.
Web feeds allow users to retrieve new content from pages on the World Wide Web. Feeds are offered by a multitude of web pages, ranging from conventional news sites to pages with user generated content such as wikis, forums or personal blogs. They notify interested readers of new content and are therefore interesting for information retrieval tasks.… (More)
To experiment properly, scientists from many research areas need large sets of real world data. Information retrieval scientists for example often need to evaluate their algorithms on a dataset or a gold standard. The availability of these datasets often is insufficient and authors with the same goal do not evaluate their approaches on the same data. To… (More)
Ontology engineering is the task of creating and refining a knowledge model for one or multiple domains. This process is difficult and a great deal of time is required to refine the ontology. In this paper, we present a system that semi-automatically aids the user in creating an ontology using a standard web browser. Our system helps to create ontologies… (More)
In this paper, we want to show which difficulties arise when automatically constructing a domain-independent knowledge base from the web. We show possible applications for such a knowledge base to emphasize its importance. Current knowledge bases often use manually-built patterns for extraction and quality assurance which does not scale well. Our… (More)