Learn More
Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difcult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats or any combination of these factors. In this(More)
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically , these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most(More)
Large repositories of data contain sensitive information that must be protected against unauthorized access. The protection of the confidentiality of this information has been a long-term goal for the database security research community and for the government statistical agencies. Recent advances in data mining and machine learning algorithms, have(More)
Business-to-Business (B2B) technologies pre-date the Web. They have existed for at least as long as the Inter-net. B2B applications were among the first to take advantage of advances in computer networking. The Electronic Data Interchange (EDI) business standard is an illustration of such an early adoption of the advances in computer networking. The(More)
In this paper, we introduce Quasi Serializability, a correctness criterion for concurrency control in heterogeneous distributed database environments. A global history is quasi serializable if it is (conflict) equivalent to a quasi serial history in which global transactions are submitted serially. Quasi serializability theory is an extension of(More)
Large repositories of data contain sensitive information which must be protected against unauthorized access. The protection of the conndentiality of this information has been a long-term goal for the database security research community and the government statistical agencies. Recent advances, in data mining and machine learning algorithms, have increased(More)
Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank aggregation algorithms. Rank-join operators progressively rank the join results while performing the join operation. The new operators have a direct impact on traditional query(More)
Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning problems are frequently encountered in many research areas, such as knowledge discovery in databases, data ware-housing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate(More)
Integrating data from multiple sources has been a longstanding challenge in the database community. Techniques such as privacy-preserving data mining promises privacy, but assume data has integration has been accomplished. Data integration methods are seriously hampered by inability to share the data to be integrated. This paper lays out a privacy framework(More)
—Periodicity mining is used for predicting trends in time series data. Discovering the rate at which the time series is periodic has always been an obstacle for fully automated periodicity mining. Existing periodicity mining algorithms assume that the periodicity rate (or simply the period) is user-specified. This assumption is a considerable limitation,(More)