David Urbansky

Learn More
In this paper, we introduce an approach for training a Named Entity Recognizer (NER) from a set of seed entities on the web. Creating training data for NERs is tedious, time consuming, and becomes more difficult with a growing set of entity types that should be learned and recognized. Named Entity Recognition is a building block in natural language(More)
Web feeds are a popular way to access updates for content in the World Wide Web. Unfortunately, the technology behind web feeds is based on polling. Thus, clients ask the feed server regularly for updates. There are two concurrent problems with this approach. First, many times a client asks for updates, there is no new item and second, if the client's(More)
Web feeds allow users to retrieve new content from pages on the World Wide Web. Feeds are offered by a multitude of web pages, ranging from conventional news sites to pages with user generated content such as wikis, forums or personal blogs. They notify interested readers of new content and are therefore interesting for information retrieval tasks.(More)
To experiment properly, scientists from many research areas need large sets of real world data. Information retrieval scientists for example often need to evaluate their algorithms on a dataset or a gold standard. The availability of these datasets often is insufficient and authors with the same goal do not evaluate their approaches on the same data. To(More)
  • 1