Data Set Used
In an ad-hoc retrieval task, the query is usually short and the user expects to find the relevant documents in the first several result pages. We explored the possibilities of using Wikipedia's articles as an external corpus to expand ad-hoc queries. Results show promising improvements over measures that emphasize on weak queries.
In our formal runs, we have experimented with the retrieval based on character-based indexing and hybrid term indexing because these are more distinct types of indexing for better pooling. We confirmed that character-based indexing did not produce relatively good retrieval effectiveness. We have also experimented with three new pseudo-relevance feedback… (More)
A novel probabilistic retrieval model is presented. It forms a basis to interpret the TF-IDF term weights as making relevance decisions. It simulates the local relevance decision-making for every location of a document, and combines all of these “local” relevance decisions as the “document-wide” relevance decision for the document.… (More)
Introduction This is my personal " summary in 337 one-liners " of A Survey in Indexing and Searching XML Documents by Luk et al. (2002) . I focus on technical aspects, omitting all system names and references. In my opinion, one cannot learn any technique from the survey: it only mentions various techniques but does not explain any. Alas, my 337… (More)
With the advent of the Internet and intranets, substantial interest is being shown in Asian language information retrieval; especially in Chinese, which is a good example of an Asian ideographic language (other examples include Japanese and Korean). Since, in this type of language, spaces do not delimit words, an important issue is which index terms should… (More)