Sunanda Patro

Learn More
Popular web search engines use Boolean queries as their main interface for users to search their information needs. The paper presents results a user survey employing volunteer web searchers to determine the effectiveness of the Boolean queries in meeting the information needs. A metric for measuring the quality of a web search query is presented. This(More)
Record linkage identifies multiple records referring to the same entity even if they are not bit-wise identical. It is thus an essential technology for data integration and data cleansing. Existing record linkage approaches are mainly relying on similarity functions based on the surface forms of the records, and hence are not able to identify complex(More)
This paper presents an algorithm to improve a web search query based on the feedback on the viewed documents. A user who is searching for information on the Web marks the retrieved (viewed) documents as relevant or irrelevant to further expose the information needs expressed in the original query. A new web search query matching this improved understanding(More)
This paper describes an algorithm whereby an initial, naïve user query to a search engine can be subsequently refined to improve both its recall and precision. This is achieved by manually classifying the documents retrieved by the original query into relevant and irrelevant categories, and then finding additional Boolean terms which successfully(More)
Clustering Web search result is a promising way to help alleviate the information overload for Web users. In this paper, we focus on clustering snippets returned by Google Scholar. We propose a novel similarity function based on mining domain knowledge and an outlier-conscious clustering algorithm. Experimental results showed improved effectiveness of the(More)
An algorithm to synthesise a web search query from example documents is described. A user searching for information on the Web can use a rudimentary query to locate a set of potentially relevant documents. The user classifies the retrieved documents as being relevant or irrelevant to his or her needs. A query can be synthesised from these categorised(More)
  • 1