Learn More
Memory-based approaches for collaborative filtering identify the similarity between two users by comparing their ratings on a set of items. In the past, the memory-based approach has been shown to suffer from two fundamental problems: data sparsity and difficulty in scalability. Alternatively, the model-based approach has been proposed to alleviate these(More)
Very large-scale classification taxonomies typically have hundreds of thousands of categories, deep hierarchies, and skewed category distribution over documents. However, it is still an open question whether the state-of-the-art technologies in automated text categorization can scale to (and perform well on) such large taxonomies. In this paper, we report(More)
As the competition of Web search market increases, there is a high demand for personalized Web search to conduct retrieval incorporating Web users' information needs. This paper focuses on utilizing clickthrough data to improve Web search. Since millions of searches are conducted everyday, a search engine accumulates a large volume of clickthrough data,(More)
Most traditional text clustering methods are based on "bag of words" (<i>BOW</i>) representation based on frequency statistics in a set of documents. <i>BOW</i>, however, ignores the important information on the semantic relationships between <i>key</i> terms. To overcome this problem, several methods have been proposed to enrich text representation with(More)
Link analysis algorithms have been extensively used in Web information retrieval. However, current link analysis algorithms generally work on a flat link graph, ignoring the hierarchal structure of the Web graph. They often suffer from two problems: the sparsity of link graph and biased ranking of newly-emerging pages. In this paper, we propose a novel(More)
Demographic information plays an important role in personalized web applications. However, it is usually not easy to obtain this kind of personal data such as age and gender. In this paper, we made a first approach to predict users' gender and age from their Web browsing behaviors, in which the Webpage view information is treated as a hidden variable to(More)
Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the “Bag of Words” (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this(More)
The performance of web search engines may often deteriorate due to the diversity and noisy information contained within web pages. User click-through data can be used to introduce more accurate description (metadata) for web pages, and to improve the search performance. However, noise and incompleteness, sparseness, and the volatility of web pages and(More)
Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web-page classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can(More)