Learn More
This paper studies document ranking under uncertainty. It is tackled in a general situation where the relevance predictions of individual documents have uncertainty, and are dependent between each other. Inspired by the Modern Portfolio Theory, an economic theory dealing with investment in financial markets, we argue that ranking under uncertainty is not(More)
Browsing constitutes an important part of the user information searching process on the Web. In this paper, we present a browser plug-in called ESpotter, which recognizes entities of various types on Web pages and highlights them according to their types to assist user browsing. ESpotter uses a range of standard named entity recognition techniques. In(More)
We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial(More)
Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online's data policy on reuse of materials please consult the policies page. ABSTRACT The Multimedia and Information Systems group at the Knowledge Media Institute of the Open University(More)
The large number of Web pages on many Web sites has raised navigational problems. Markov chains have recently been used to model user navigational behavior on the World Wide Web (WWW). In this paper, we propose a method for constructing a Markov model of a Web site based on past visitor behavior. We use the Markov model to make link predictions that assist(More)
Most retrieval models estimate the relevance of each document to a query and rank the documents accordingly. However, such an approach ignores the uncertainty associated with the estimates of relevancy. If a high estimate of relevancy also has a high uncertainty, then the document may be very relevant or not relevant at all. Another document may have a(More)
TREC 2009 was the first year of the Chemical IR Track, which focuses on evaluation of search techniques for discovery of digitally stored information on chemical patents and academic journal articles. The track included two tasks: Prior Art (PA) and Technical Survey (TS) tasks. This paper describes how we designed the two tasks and presents the official(More)
In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers(More)