Learn More
We describe methods to search with a query by example in a known domain for information in an unknown domain by exploiting Web search engines. Relational search is an effective way to obtain information in an unknown field for users. For example, if an <i>Apple</i> user searches for <i>Microsoft</i> products, similar <i>Apple</i> products are important(More)
We propose a kernel method for using combinations of features across example pairs in learning pairwise classifiers. Identifying two instances in the same class is an important technique in duplicate detection , entity matching, and other clustering problems. However, it is a difficult problem when instances have few discriminative features. One typical(More)
The expansion of the Internet and the number of its users has raised many new problems in information retrieval. The most common way to find information in the web is using web search engines. However, gathering information from the web is a difficult task for a novice user even if he uses the search engines. The user must have experience and skill to find(More)
Domain-specific web search engines are effective tools for reducing the difficulty in acquiring information from the web. Existing methods for building domain-specific web search engines require human expertise or specific facilities. However, we can build a domain-specific search engine simply by adding domain specific keywords called " keyword spices " to(More)
In this paper, we discuss problems with HITS (Hyperlink-Induced Topic Search) algorithm, which capitalizes on hy-perlinks to extract topic-bound communities of web pages. Despite its theoretically sound foundations, we observed HITS algorithm failed in real applications. In order to understand this problem, we developed a visualization tool LinkViewer,(More)
This paper presents a new method for building domain-specific web search engines. Previous methods eliminate irrelevant documents from the pages accessed using heuristics based on human knowledge about the domain in question. Accordingly , they are hard to build and can not be applied to other domains. The keyword spice method, in contrast, improves search(More)
A lot of future-related information is available in news articles or Web pages. This information can however differ to large extent and may fluctuate over time. It is therefore difficult for users to manually compare and aggregate it, and to re-construct the most probable course of future events. In this paper we approach a problem of automatically(More)
Pairwise classification has many applications including network prediction , entity resolution, and collaborative filtering. The pairwise kernel has been proposed for those purposes by several research groups independently, and become successful in various fields. In this paper, we propose an efficient alternative which we call Cartesian kernel. While the(More)