Qingzhao Tan

Learn More
The information on the World Wide Web is growing without bound. Users may have very diversified preferences in the pages they target through a search engine. It is therefore a challenging task to adapt a search engine to suit the needs of a particular community of users who share similar interests. In this paper, we propose a new algorithm, Ranking SVM in a(More)
When crawling resources, for example, number of machines, crawl-time, and so on, are limited, so a crawler has to decide an optimal order in which to crawl and recrawl Web pages. Ideally, crawlers should request only those Web pages that have changed since the last crawl; in practice, a crawler may not know whether a Web page has changed before downloading(More)
Often scientists seek to search for articles on the Web related to a particular chemical. When a scientist searches for a chemical formula using a search engine today, she gets articles where the exact keyword string expressing the chemical formula is found. Searching for the exact occurrence of keywords during searching results in two problems for this(More)
Skyline queries have received a lot of attention from database and information retrieval research communities. A skyline query returns a set of data objects that is not dominated by any other data objects in a given dataset. However, most of existing studies focus on skyline query processing in centralized systems. Only recently, skyline queries are(More)
Due to resource constraints, search engines usually have difficulties keeping the local database completely synchronized with the Web. To detect as many changes as possible, the crawler used by a search engine should be able to predict the change behavior of webpages so that it can use the limited resource to download those webpages that are most likely to(More)
Due to resource constraints, Web archiving systems and search engines usually have difficulties keeping the entire local repository synchronized with the Web. We advance the state-of-art of the sampling-based synchronization techniques by answering a challenging question: <i>Given a sampled webpage and its change status, which other webpages are also likely(More)
The World Wide Web is growing and changing at an astonishing rate. Web information systems such as search engines have to keep up with the growth and change of the Web. Due to resource constraints, search engines usually have difficulties keeping the local database completely synchronized with the Web. In this paper, we study how tomake good use of the(More)
Most current metasearch engines provide uniform service to users but do not cater for the specific needs of individual users. To address this problem, research has been done on personalizing a metasearch engine. An interesting and practical approach is to optimize its ranking function using clickthrough data. However, it is still challenging to infer(More)