Jinguo You

Learn More
Closed cubing is a very efficient algorithm for data cube compression proposed recently in the literature. It losslessly condenses a group of cells into one cell if these cells have the same aggregate value and preserve roll-up/drill-down semantics. Despite its importance, parallel closed cubing solutions for huge data sets are not well studied so far to(More)
With rapid development of the Internet, the original knowledge management systems which are centralized control can not be adaptive to the distributed environment, because there maybe more than one knowledge bases including personal knowledge base in the collaborative environment and involve knowledge in each phase of implement process. Currently, to solve(More)
As data warehouses grow in size, ensuring adequate database performance will be a big challenge. This paper presents a solution, called HDW, based on Google infrastructure such as GFS, Bigtable, MapReduce to build and manage a large scale distributed data warehouse for high performance OLAP analysis. In addition, HDW provides XMLA standard interface for(More)
Given a collection of sets and a query set, a T-Overlap query identifies all sets having at least T common elements with the query. T-Overlap query is the foundation of set similarity query and join and plays an important role on web data query and processing, such as the behavior analysis of web users and the near duplicated detection of web documents. To(More)
  • 1