Tom Rowlands

Learn More
Web search engines discover indexable documents by recursively 'crawling' from a seed URL. Their rankings take into account link popularity. While this works well, it introduces biases towards older documents. Older documents are more likely to be the target of links, while new documents with few, or no, incoming links are unlikely to rank highly in search(More)
When a searcher submits a query Q and clicks on document R in the corresponding result set, we may plausibly interpret the click as a vote that Q is a description of R. We call the Q and R pairing a 'click description'. Click descriptions thus derived from search engine logs can be accumulated into surrogate documents and used to boost retrieval(More)
In real world use of test collection methods, it is essential that the query test set be representative of the work load expected in the actual application. Using a random sample of queries from a media company's query log as a 'gold standard' test set we demonstrate that biases in sitemap-derived and top <i>n</i> query sets can lead to significant(More)
Tuning a search facility such as a Web search engine, or an enterprise search tool deployed in a particular organisation, is an economically important activity. Intuitively, an important end goal of tuning should be to maximise satisfaction across the searchers who will use the facility. Tuning should therefore use an unbiased sample of actual search(More)
Tags and emergent folksonomies are a potentially rich new source of document annotations, offering query independent and dependent evidence for exploitation by information retrieval systems. Previous research has shown that tags may facilitate improved web search in an environment where each tagging action generates a (user, tag, resource) triple. For(More)
Numerous industry studies document the serious productivity cost incurred by knowledge-reliant organisations when employees are unable to locate the information resources they need to do their jobs, or take too long to do so. The traditional approach to avoiding this problem has been to impose an organisation-specific taxonomy and to tag documents with(More)