#### Filter Results:

- Full text PDF available (17)

#### Publication Year

2003

2015

- This year (0)
- Last 5 years (4)
- Last 10 years (18)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Chris Jermaine, Peter J. Haas
- SIGMOD Conference
- 2008

To deal with data uncertainty, existing probabilistic database systems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system's ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty… (More)

- Xiuyao Song, Mingxi Wu, Chris Jermaine, Sanjay Ranka
- IEEE Transactions on Knowledge and Data…
- 2007

When anomaly detection software is used as a data analysis tool, finding the hardest-to-detect anomalies is not the most critical task. Rather, it is often more important to make sure that those anomalies that are reported to the user are in fact interesting. If too many unremarkable data points are returned to the user labeled as candidate anomalies, the… (More)

- Xiuyao Song, Mingxi Wu, Chris Jermaine, Sanjay Ranka
- KDD
- 2007

This paper deals with detecting change of distribution in multi-dimensional data sets. For a given baseline data set and a set of newly observed data points, we define a statistical test called the <i>density test</i> for deciding if the observed data points are sampled from the underlying distribution that produced the baseline data set. We define a test… (More)

- Ravi Jampani, Fei Xu, Mingxi Wu, Luis Leopoldo Perez, Chris Jermaine, Peter J. Haas
- ACM Trans. Database Syst.
- 2011

The application of stochastic models and analysis techniques to large datasets is now commonplace. Unfortunately, in practice this usually means extracting data from a database system into an external tool (such as SAS, R, Arena, or Matlab), and then running the analysis there. This extract-and-model paradigm is typically error-prone, slow, does not support… (More)

- Mingxi Wu, Chris Jermaine
- VLDB
- 2007

For a large number of data management problems, it would be very useful to be able to obtain a few samples from a data set, and to use the samples to guess the largest (or smallest) value in the entire data set. Min/max online aggregation, top-k query processing, outlier detection, and distance join are just a few possible applications. This paper details a… (More)

- Mingxi Wu, Chris Jermaine
- KDD
- 2006

An effective approach to detecting anomalous points in a data setis distance-based outlier detection. This paper describes a simplesampling algorithm to effciently detect distance-based outliers indomains where each and every distance computation is veryexpensive. Unlike any existing algorithms, the sampling algorithmrequires a xed number of distance… (More)

- Mingxi Wu, Xiuyao Song, Chris Jermaine, Sanjay Ranka, John Gums
- KDD
- 2009

Given a spatial data set placed on an <i>n</i> x <i>n</i> grid, our goal is to find the rectangular regions within which subsets of the data set exhibit anomalous behavior. We develop algorithms that, given any user-supplied arbitrary likelihood function, conduct a likelihood ratio hypothesis test (LRT) over each rectangular region in the grid, rank all of… (More)

The integration of heterogeneous legacy databases requires understanding of database structure and content. We previously developed a theoretical and software infrastructure to support the extraction of schema and business rule information from legacy sources, combining database reverse engineering with semantic analysis of associated application code… (More)

- Hazem Elmeleegy, Yinan Li, +5 authors Songting Chen
- PVLDB
- 2013

This paper gives an overview of Turn Data Management Platform (DMP). We explain the purpose of this type of platforms, and show how it is positioned in the current digital advertising ecosystem. We also provide a detailed description of the key components in Turn DMP. These components cover the functions of (1) data ingestion and integration, (2) data… (More)

- Florin Rusu, Fei Xu, +4 authors Alin Dobra
- SIGMOD Conference
- 2008

We demonstrate our prototype of the DBO database system. DBO is designed to facilitate scalable analytic processing over large data archives. DBO's analytic processing performance is competitive with other database systems; however, unlike any other existing research or industrial system, DBO maintains a statistically meaningful guess to the final answer to… (More)