#### Filter Results:

#### Publication Year

2003

2015

#### Publication Type

#### Co-author

#### Key Phrase

#### Publication Venue

Learn More

To deal with data uncertainty, existing probabilistic database systems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system's ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty… (More)

This paper deals with detecting change of distribution in multi-dimensional data sets. For a given baseline data set and a set of newly observed data points, we define a statistical test called the <i>density test</i> for deciding if the observed data points are sampled from the underlying distribution that produced the baseline data set. We define a test… (More)

An effective approach to detecting anomalous points in a data setis distance-based outlier detection. This paper describes a simplesampling algorithm to effciently detect distance-based outliers indomains where each and every distance computation is veryexpensive. Unlike any existing algorithms, the sampling algorithmrequires a xed number of distance… (More)

The application of stochastic models and analysis techniques to large datasets is now commonplace. Unfortunately, in practice this usually means extracting data from a database system into an external tool (such as SAS, R, Arena, or Matlab), and then running the analysis there. This extract-and-model paradigm is typically error-prone, slow, does not support… (More)

For a large number of data management problems, it would be very useful to be able to obtain a few samples from a data set, and to use the samples to guess the largest (or smallest) value in the entire data set. Min/max online aggregation, top-k query processing, outlier detection, and distance join are just a few possible applications. This paper details a… (More)

- Florin Rusu, Fei Xu, Luis Leopoldo Perez, Mingxi Wu, Ravi Jampani, Chris Jermaine +1 other
- SIGMOD Conference
- 2008

We demonstrate our prototype of the DBO database system. DBO is designed to facilitate scalable analytic processing over large data archives. DBO's analytic processing performance is competitive with other database systems; however, unlike any other existing research or industrial system, DBO maintains a statistically meaningful guess to the final answer to… (More)

Given a spatial data set placed on an <i>n</i> x <i>n</i> grid, our goal is to find the rectangular regions within which subsets of the data set exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all of… (More)

- Hazem Elmeleegy, Yinan Li, Yan Qi, Peter Wilmot, Mingxi Wu, Santanu Kolay +2 others
- PVLDB
- 2013

This paper gives an overview of Turn Data Management Platform (DMP). We explain the purpose of this type of platforms, and show how it is positioned in the current digital advertising ecosystem. We also provide a detailed description of the key components in Turn DMP. These components cover the functions of (1) data ingestion and integration , (2) data… (More)

Given a spatial dataset placed on an <i>n</i> ×<i>n</i> grid, our goal is to find the rectangular regions within which subsets of the dataset exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all… (More)