Arjen P. de Vries

Learn More
Memory-based methods for collaborative filtering predict new ratings by averaging (weighted) ratings between, respectively, pairs of similar users <i>or</i> items. In practice, a large number of ratings from similar users or similar items are not available, due to the sparsity inherent to rating data. Consequently, prediction quality can be poor. This paper(More)
This paper presents a new approach for classifying individual video frames as being a ‘cartoon’ or a ‘photographic image’. The task arose from experiments performed at the TREC-2002 video retrieval benchmark: ‘cartoons’ are returned unexpectedly at high ranks even if the query gave only ‘photographic’ image examples. Distinguishing between the two genres(More)
Within the INitiative for the Evaluation of XML Retrieval(INEX) a number of metrics to evaluate the effectiveness of content-oriented XML retrieval approaches were developed. Although these metrics provide a solution towards addressing the problem of overlapping result elements, they do not consider the problem of overlapping reference components within the(More)
We investigate to what extent people making relevance judgements for a reusable IR test collection are exchangeable. We consider three classes of judge: "gold standard" judges, who are topic originators and are experts in a particular information seeking task; "silver standard" judges, who are task experts but did not create topics; and "bronze standard"(More)
Applications like multimedia retrieval require efficient support for similarity search on large data collections. Yet, nearest neighbor search is a difficult problem in high dimensional spaces, rendering efficient applications hard to realize: index structures degrade rapidly with increasing dimensionality, while sequential search is not an attractive(More)
The goal of the entity track is to perform entity-oriented search tasks on the World Wide Web. Many user information needs would be better answered by specific entities instead of just any type of documents. The track defines entities as “typed search results”, “things”, represented by their homepages on the web. Searching for entities thus corresponds to(More)
Crowdsourcing successfully strives to become a widely used means of collecting large-scale scientific corpora. Many research fields, including Information Retrieval, rely on this novel way of data acquisition. However, it seems to be undermined by a significant share of workers that are primarily interested in producing quick generic answers rather than(More)