Learn More
To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy consumption, (3) uses only single-hop communication, thus permitting very simple node failure detection and message(More)
Peer-to-peer (P2P) networks are gaining popularity in many applications such as file sharing, e-commerce, and social networking, many of which deal with rich, distributed data sources that can benefit from data mining. P2P networks are, in fact, well-suited to distributed data mining (DDM), which deals with the problem of data analysis in environments with(More)
Mining for associations between items in large transactional databases is a central problem in the field of knowledge discovery. When the database is partitioned among several share-nothing machines, the problem can be addressed using distributed data mining algorithms. One such algorithm, called CD, was proposed by Agrawal and Shafer in [1] and was later(More)
We extend the problem of association rule mining--a key data mining problem--to systems in which the database is partitioned among a very large number of computers that are dispersed over a wide area. Such computing systems include grid computing platforms, federated database systems, and peer-to-peer computing environments. The scale of these systems poses(More)
This paper offers a scalable and robust distributed algorithm for decision tree induction in large Peer-to-Peer (P2P) environments. Computing a decision tree in such large distributed systems using standard centralized algorithms can be very communication-expensive and impractical because of the synchronization requirements. The problem becomes even more(More)
We present a new distributed association rule mining (D-ARM) algorithm that demonstrates superlinear speed-up with the number of computing nodes. The algorithm is the first D-ARM algorithm to perform a single scan over the database. As such, its performance is unmatched by any previous algorithm. Scale-up experiments over standard synthetic benchmarks(More)
We present a local distributed algorithm for a general Majority Voting problem: different and timevariable voting powers and vote splits, arbitrary and dynamic interconnection topologies and link delays, and any fixed majority threshold. The algorithm combines a novel, efficient anytime spanning forest algorithm, which may also have applications elsewhere,(More)
In a facility location problem (FLP) we are given a set of facilities and a set of clients, each of which is to be served by one facility. The goal is to decide which subset of facilities to open, such that the clients will be served at a minimal cost. In this paper we investigate the FLP in a setting where the cost depends on data known only to the(More)
Data privacy is a major threat to the widespread deployment of data grids in domains such as health care and finance. We propose a novel technique for obtaining knowledge - by way of a data mining model - from a data grid, while ensuring that the privacy is cryptographically secure. To the best of our knowledge, all previous approaches for solving this(More)