Kevin Chen-Chuan Chang

Learn More
Top-k processing in uncertain databases is semantically and computationally different from traditional top-k processing. The interplay between score and uncertainty makes traditional techniques inapplicable. We introduce new probabilistic formulations for top-k queries. Our formulations are based on "marriage" of traditional top-k semantics and possible(More)
The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep Web" of searchable databses is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and(More)
Web page classification is one of the essential techniques for Web mining. Specifically, classifying Web pages of a user-interesting class is the first step of mining interesting information from the Web. However, constructing a classifier for an interesting class requires laborious pre-processing such as collecting positive and negative training examples.(More)
This paper introduces RankSQL, a system that provides a systematic and principled framework to support efficient evaluations of ranking (<i>top-k</i>) queries in relational database systems (RDBMS), by extending relational algebra and query optimization. Previously, <i>top-k</i> query processing is studied in the middleware scenario or in RDBMS in a(More)
Recently, the Web has been rapidly "deepened" by many searchable databases online, where data are hidden behind query forms. For modelling and integrating Web databases, the very first challenge is to understand what a query interface says- or what <i>query capabilities</i> a source supports. Such automatic extraction of interface semantics is challenging,(More)
Users' locations are important to many applications such as targeted advertisement and news recommendation. In this paper, we focus on the problem of profiling users' home locations in the context of social network (Twitter). The problem is nontrivial, because signals, which may help to identify a user's location, are scarce and noisy. We propose a unified(More)
The Web has been rapidly " deepened " by myriad searchable databases online, where data are hidden behind query interfaces. Toward large scale integration over this " deep Web, " we have been building the MetaQuerier system– for both exploring (to find) and integrating (to query) databases on the Web. As an interim report, first, this paper proposes our(More)
Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding pairwise-attribute correspondence. This paper proposes a different approach, motivated by integrating large numbers of data sources on the Internet. On this "deep Web," we observe(More)
As the Web has evolved into a data-rich repository, with the standard " page view, " current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data " entities " (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. While entities appear in many pages,(More)