Learn More
The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep Web" of searchable databses is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and(More)
Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for top-k queries. Our formulations are based on "marriage" of traditional top-k semantics and possible(More)
Web page classification is one of the essential techniques for Web mining. Specifically, classifying Web pages of a user-interesting class is the first step of mining interesting information from the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples.(More)
Recently, the Web has been rapidly "deepened" by many searchable databases online, where data are hidden behind query forms. For modelling and integrating Web databases, the very first challenge is to understand what a query interface says- or what <i>query capabilities</i> a source supports. Such automatic extraction of interface semantics is challenging,(More)
Users' locations are important to many applications such as targeted advertisement and news recommendation. In this paper, we focus on the problem of profiling users' home locations in the context of social network (Twitter). The problem is nontrivial, because signals, which may help to identify a user's location, are scarce and noisy. We propose a unified(More)
This paper introduces RankSQL, a system that provides a systematic and principled framework to support efficient evaluations of ranking (<i>top-k</i>) queries in relational database systems (RDBMS), by extending relational algebra and query optimization. Previously, <i>top-k</i> query processing is studied in the middleware scenario or in RDBMS in a(More)
Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondence. This paper proposes a different approach, motivated by integrating large numbers of data sources on the Internet. On this "deep Web," we observe(More)
The Web has been rapidly " deepened " by myriad searchable databases online, where data are hidden behind query interfaces. Toward large scale integration over this " deep Web, " we have been building the MetaQuerier system– for both exploring (to find) and integrating (to query) databases on the Web. As an interim report, first, this paper proposes our(More)
Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However, constructing a classifier for an interesting class requires laborious preprocessing such as collecting positive and negative training examples. For instance, in order to construct(More)