Zhibo Chen

Learn More
Statistical tests represent an important technique used to formulate and validate hypotheses on a dataset. They are particularly useful in the medical domain, where hypotheses link disease with medical measurements, risk factors, and treatment. In this paper, we propose to compute parametric statistical tests treating patient records as elements in a(More)
Distributed relational databases are used by different organizations located at multiple sites that work together on common projects. In this article, we focus on distributed relational databases with incomplete and inconsistent content. We propose to measure referential integrity errors in them for integration and interoperability purposes. We propose(More)
Since the early 1990s, On-Line Analytical Processing (OLAP) has been a well studied research topic that has focused on implementation outside the database, either with OLAP servers or entirely within the client computers. Our approach involves the computation and storage of OLAP cubes using User-Defined Functions (UDF) with a database management system.(More)
—Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build(More)
A federated database consists of several loosely integrated databases, where each database may contain hundreds of tables and thousands of columns,interrelated by complex foreign key relationships. In general, there exists a lot of semistructured data elements outside the database represented by documents (files), created and updated by multiple users and(More)
OLAP is a set of database exploratory techniques to efficiently retrieve multiple sets of aggregations from a large dataset. Generally, these techniques have either involved the use of an external OLAP server or required the dataset to be exported to a specialized OLAP tool for more efficient processing. In this work, we show that OLAP techniques can be(More)
In On-Line Analytical Processing (OLAP), users explore a database cube with roll-up and drill-down operations in order to find interesting results. Most approaches rely on simple aggregations and value comparisons in order to validate findings. In this work, we propose to combine OLAP dimension lattice traversal and statistical tests to discover significant(More)
Ontologies are knowledge conceptualizations of a particular domain and are commonly represented with hierarchies. While final ontologies appear deceivingly simple on paper, building ontologies represents a time-consuming task that is normally performed by natural language processing techniques or schema matching. On the other hand, OLAP cubes are most(More)