Learn More
XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for XML query processing. Prior work has typically decomposed the twig pattern into binary structural(More)
1 SQL Expression In [GIJ + 01a, GIJ + 01b] we described how to use q-grams in an RDBMS to perform approximate string joins. We also showed how to implement the approximate join using plain SQL queries. Specifically, we described three filters, count filter, position filter, and length filter, which can be used to execute efficiently the approximate join.(More)
XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. In this paper, we(More)
We present TwitterMonitor, a system that performs trend detection over the Twitter stream. The system identifies emerging topics (i.e. 'trends') on Twitter in real time and provides meaningful analytics that synthesize an accurate description of each topic. Users interact with the system by ordering the identified trends using different criteria and(More)
The problem of obtaining efficient answers to top-<i>k</i> queries has attracted a lot of research attention. Several algorithms and numerous variants of the top-<i>k</i> retrieval problem have been introduced in recent years. The general form of this problem requests the <i>k</i> highest ranked values from a relation, using monotone combining functions on(More)
Histograms are commonly used to capture attribute value distribution statistics for query optimizers. More recently, histograms have also been considered as a way to produce quick approximate answers to decision support queries. This widespread interest in histograms motivates the problem of computing his-tograms that are good under a given error metric. In(More)
Large-scale data analysis lies in the core of modern enterprises and scientific research. With the emergence of cloud computing, the use of an analytical query processing infrastructure (e.g., Amazon EC2) can be directly mapped to monetary value. MapReduce has been a popular framework in the context of cloud computing, designed to serve long running queries(More)
Users often need to optimize the selection of objects by appropriately weighting the importance of multiple object attributes. Such optimization problems appear often in operations' research and applied mathematics as well as everyday life; e.g., a buyer may select a home as a weighted function of a number of attributes like its distance from office, its(More)
Privacy is a serious concern when microdata need to be released for ad hoc analyses. Simple de-identification has been shown to be inadequate , since privacy can be compromised when quasi-identifiers in a de-identified database are linked with publicly available information. To mitigate the problem, generalization and suppression based approaches (such(More)