Zhibo Chen

Learn More
Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build(More)
Statistical tests represent an important technique used to formulate and validate hypotheses on a dataset. They are particularly useful in the medical domain, where hypotheses link disease with medical measurements, risk factors, and treatment. In this paper, we propose to compute parametric statistical tests treating patient records as elements in a(More)
Distributed relational databases are used by different organizations located at multiple sites that work together on common projects. In this article, we focus on distributed relational databases with incomplete and inconsistent content. We propose to measure referential integrity errors in them for integration and interoperability purposes. We propose(More)
Since the early 1990s, On-Line Analytical Processing (OLAP) has been a well studied research topic that has focused on implementation outside the database, either with OLAP servers or entirely within the client computers. Our approach involves the computation and storage of OLAP cubes using User-Defined Functions (UDF) with a database management system.(More)
A federated database consists of several loosely integrated databases, where each database may contain hundreds of tables and thousands of columns,interrelated by complex foreign key relationships. In general, there exists a lot of semistructured data elements outside the database represented by documents (files), created and updated by multiple users and(More)
In On-Line Analytical Processing (OLAP), users explore a database cube with roll-up and drill-down operations in order to find interesting results. Most approaches rely on simple aggregations and value comparisons in order to validate findings. In this work, we propose to combine OLAP dimension lattice traversal and statistical tests to discover significant(More)
OLAP is a set of database exploratory techniques to efficiently retrieve multiple sets of aggregations from a large dataset. Generally, these techniques have either involved the use of an external OLAP server or required the dataset to be exported to a specialized OLAP tool for more efficient processing. In this work, we show that OLAP techniques can be(More)
Ontologies are knowledge conceptualizations of a particular domain and are commonly represented with hierarchies. While final ontologies appear deceivingly simple on paper, building ontologies represents a time-consuming task that is normally performed by natural language processing techniques or schema matching. On the other hand, OLAP cubes are most(More)
Association rules is a technique that can detect patterns within the items of a dataset. The constrained version applies several restrictions that reduces the number of rules and also helps improve performance. On the other hand, OLAP statistical tests is an integration of exploratory On-Line Analytical Processing techniques and statistical tests. It uses a(More)