Learn More
Memory-based methods for collaborative filtering predict new ratings by averaging (weighted) ratings between, respectively, pairs of similar users <i>or</i> items. In practice, a large number of ratings from similar users or similar items are not available, due to the sparsity inherent to rating data. Consequently, prediction quality can be poor. This paper(More)
Within the INitiative for the Evaluation of XML Retrieval(INEX) a number of metrics to evaluate the effectiveness of content-oriented XML retrieval approaches were developed. Although these metrics provide a solution towards addressing the problem of overlapping result elements, they do not consider the problem of overlapping reference components within the(More)
In many domains of information retrieval, system estimates of document relevance are based on multidimensional quality criteria that have to be accommodated in a unidimensional result ranking. Current solutions to this challenge are often inconsistent with the formal probabilistic framework in which constituent scores were estimated, or use sophisticated(More)
We investigate to what extent people making relevance judgements for a reusable IR test collection are exchangeable. We consider three classes of judge: "gold standard" judges, who are topic originators and are experts in a particular information seeking task; "silver standard" judges, who are task experts but did not create topics; and "bronze standard"(More)
This article describes a new TREC Enterprise Track search test collection -- CERC. The collection is designed to represent some real-world search activity within the enterprise, using as a specific example the Commonwealth Scientific and Industrial Research Organisation (CSIRO). It has a deep crawl of CSIRO's public-facing information, that is very similar(More)
Implicit acquisition of user preferences makes log-based col-laborative filtering favorable in practice to accomplish recommendations. In this paper, we follow a formal approach in text retrieval to re-formulate the problem. Based on the classic probability ranking principle, we propose a probabilistic user-item relevance model. Under this formal model, we(More)
Applications like multimedia retrieval require efficient support for similarity search on large data collections. Yet, nearest neighbor search is a difficult problem in high dimensional spaces, rendering efficient applications hard to realize: index structures degrade rapidly with increasing dimensionality, while sequential search is not an attractive(More)
Collaborative filtering is concerned with making recommendations about items to users. Most formulations of the problem are specifically designed for predicting user ratings, assuming past data of explicit user ratings is available. However, in practice we may only have implicit evidence of user preference; and furthermore, a better view of the task is of(More)
Keywords: XIRAF Forensic digital investigation XML database Tool-integration XQuery Standoff annotation a b s t r a c t This paper describes a novel, XML-based approach towards managing and querying forensic traces extracted from digital evidence. This approach has been implemented in XIRAF, a prototype system for forensic analysis. XIRAF systematically(More)
CWI and University of Twente used PF/Tijah, a flexible XML retrieval system, to evaluate structured document retrieval, multi-media retrieval, and entity ranking tasks in the context of INEX 2007. For the retrieval of textual and multimedia elements in the Wikipedia data, we investigated various length priors and found that biasing towards longer elements(More)