Learn More
Through BM25, the asymptotic term frequency quantification TF = tf/(tf+K), where tf is the within-document term frequency and K is a normalisation factor, became popular. This paper reports a finding regarding the meaning of the TF quantification: in the triangle of independence and subsumption, the TF quantification forms the altitude, that is, the middle(More)
This paper presents a probabilistic relational modelling (implementation) of the major probabilistic retrieval models. Such a high-level implementation is useful since it supports the ranking of any object, it allows for the reasoning across structured and unstructured data, and it gives the software (knowledge) engineer control over ranking and thus(More)
This paper proposes a probabilistic logic abstraction for modelling tf -boosting approaches to anchor text retrieval, adapted for the task of page-search in books. The underlying idea is to view the backof-book index (BoBI) as a list of anchors pointing to pages in the book. First, we model the direct application of hypertext-based tf boosting to books and(More)
The enterprise track caught our attention, since the task is similar to a project we carried our for the BBC. Our motivation for participation has been twofold: On one hand, there is the usual challenge to design and test the quality of retrieval strategies. On the other hand, and for us very important, the TREC participation has been an opportunity to(More)
The report presents our studies to improve the efficiency of DB+IR integrating technology. The three main contributions are: 1. we propose a novel top-k mechanism called lazy top-k, the general top operation in DB+IR system is discussed, and a histogram inverted index are designed to support the lazy top-k algorithms; 2. by studying the probabilistic(More)
  • 1