Learn More
The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep Web" of searchable databses is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and(More)
This paper introduces RankSQL, a system that provides a systematic and principled framework to support efficient evaluations of ranking (<i>top-k</i>) queries in relational database systems (RDBMS), by extending relational algebra and query optimization. Previously, <i>top-k</i> query processing is studied in the middleware scenario or in RDBMS in a(More)
This paper presents a principled framework for efficient processing of ad-hoc <i>top-k</i> (ranking) aggregate queries, which provide the <i>k</i> groups with the highest aggregates as results. Essential support of such queries is lacking in current systems, which process the queries in a na&#239;ve materialize-group-sort scheme that can be prohibitively(More)
This paper proposes Facetedpedia, a faceted retrieval system for information discovery and exploration in Wikipedia. Given the set of Wikipedia articles resulting from a keyword query, Facetedpedia generates a faceted interface for navigating the result articles. Compared with other faceted retrieval systems, Facetedpedia is fully automatic and dynamic in(More)
This paper presents a source selection system based on attribute co-occurrence framework for ranking and selecting Deep Web sources that provide information relevant to users requirement. Given the huge number of heterogeneous Deep Web data sources, the end users may not know the sources that can satisfy their information needs. Selecting and ranking(More)
We present a novel query language for large-scale analysis of XML data on a map-reduce environment, called MRQL, that is expressive enough to capture most common data analysis tasks and at the same time is amenable to optimization. Our evaluation plans are constructed using a small number of higher-order physical operators that are directly implementable on(More)
With the emergence of the deep web, searching web databases in domains such as vehicles, real estate, etc., has become a routine task. One of the problems in this context is ranking the results of a user query. Earlier approaches for addressing this problem have used frequencies of database values, query logs, and user profiles. A common thread in most of(More)
MapReduce has become a common programming model for processing very large amounts of data, which is needed in a spectrum of modern computing applications. Today several MapReduce implementations and execution systems exist and many MapReduce programs are being developed and deployed in practice. However, developing MapReduce programs is not always an easy(More)
Objects with multiple numeric attributes can be compared within any "subspace" (subset of attributes). In applications such as computational journalism, users are interested in claims of the form: <i>Karl Malone is one of the only two players in NBA history with at least 25,000 points, 12,000 rebounds, and 5,000 assists in one's career</i>. One challenge in(More)