Jilian Zhang

Learn More
Query answers from servers operated by third parties need to be verified, as the third parties may not be trusted or their servers may be compromised. Most of the existing authentication methods construct validity proofs based on the Merkle hash tree (MHT). The MHT, however, imposes severe concurrency constraints that slow down data updates. We introduce a(More)
Maintaining frequent itemsets (patterns) is one of the most important issues faced by the data mining community. While many algorithms for pattern discovery have been developed, relatively little work has been reported on mining dynamic databases, a major area of application in this field. In this paper, a new algorithm, namely the Efficient Dynamic(More)
We propose an efficient nonparametric missing value imputation method based on clustering, called CMI (Clustering-based Missing value Imputation), for dealing with missing values in target attributes. In our approach, we impute the missing values of an instance A with plausible values that are generated from the data in the instances which do not contain(More)
Missing value imputation is an actual yet challenging issue confronted by machine learning and data mining. Existing missing value imputation is a procedure that replaces the missing values in a dataset by some plausible values. The plausible values are generally generated from the dataset using a deterministic, or random method. In this paper we propose a(More)
A top-k query shortlists the k records in a dataset that best match the user's preferences. To indicate her preferences, the user typically determines a numeric weight for each data dimension (i.e., attribute). We refer to these weights collectively as the <i>query vector</i>. Based on this vector, each data record is implicitly mapped to a score value (via(More)
Data mining and machine learning must confront the problem of pattern maintenance because data updating is a fundamental operation in data management. Most existing data-mining algorithms assume that the database is static, and a database update requires rediscovering all the patterns by scanning the entire old and new data. While there are many efficient(More)
Missing data imputation is an important issue in machine learning and data mining. In this paper, we propose a new and efficient imputation method for a kind of missing data: semi-parametric data. Our imputation method aims at making an optimal evaluation about Root Mean Square Error (RMSE), distribution function and quantile after missing-data are imputed.(More)