Learn More
With the growth of the Web, there has been a rapid increase in the number of users who need to access on-line databases without having a detailed knowledge of the schema or of query languages; even relatively simple query languages designed for non-experts are too complicated for them. We describe BANKS, a system which enables keyword-based search on(More)
A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks(More)
The rapid growth of the WorldWide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a pre-defined set of topics. The topics are specified(More)
After the invention of the Web many things in our lives (such as how we do research, communicate, entertain ourselves, and do personal finance) have changed. Simply, the Web has changed our lives. Perhaps, it even contributed to the worldwide obesity problem, especially among the youth. Like other technological inventions, it made our lives easier, lazier,(More)
Relational, XML and HTML data can be represented as graphs with entities as nodes and relationships as edges. Text is associated with nodes and possibly edges. Keyword search on such graphs has received much attention lately. A central problem in this scenario is to efficiently extract from the data graph a small number of the " best " answer trees. A(More)
To take the first step beyond keyword-based search toward entity-based search, suitable token spans ("spots") on documents must be identified as references to real-world entities from an entity catalog. Several systems have been proposed to link spots on Web pages to entities in Wikipedia. They are largely based on local compatibility between the text(More)
We describe the design, prototyping and evaluation of ARC, a system for automatically compiling a list of authoritative web resources on any (sufficiently broad) topic. The goal of ARC is to compile resource lists similar to those provided by Yahoo! or Infoseek. The fundamental difference is that these services construct lists either manually or through a(More)
It is well known that after placing n balls independently and uniformly at random into n bins, the fullest bin holds Θ(log n/ log log n) balls with high probability. More recently, Azar et al. analyzed the following process: randomly choose d bins for each ball, and then place the balls, one by one, into the least full bin from its d choices [2]. They show(More)
We consider the problem of nding near-optimal solutions for a variety of NP-hard scheduling problems for which the objective is to minimize the total weighted completion time. Recent work has led to the development of several techniques that yield constant worst-case bounds in a number of settings. We continue this line of research by providing improved(More)