István Hegedüs

Learn More
Machine learning over fully distributed data poses an important problem in peer-to-peer (P2P) applications. In this model we have one data record at each network node, but without the possibility to move raw data due to privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is(More)
Fully distributed data mining algorithms build global models over large amounts of data distributed over a large number of peers in a network, without moving the data itself. In the area of peer-to-peer (P2P) networks, such algorithms have various applications in P2P social networking , and also in trackerless BitTorrent communities. The difficulty of the(More)
We focus on the problem of data mining over large-scale fully distributed databases, where each node stores only one data record. We assume that a data record is never allowed to leave the node it is stored at. Possible motivations for this assumption include privacy or a lack of a centralized infrastructure. To tackle this problem, earlier we proposed the(More)
Offering personalized recommendation as a service in fully distributed applications such as file-sharing, distributed search, social networking, P2P television , etc, is an increasingly important problem. In such networked environments recommender algorithms should meet the same performance and reliability requirements as in centralized services. To achieve(More)
The multi-armed bandit problem has attracted remarkable attention in the machine learning community and many efficient algorithms have been proposed to handle the so-called exploitation-exploration dilemma in various bandit setups. At the same time, significantly less effort has been devoted to adapting bandit algorithms to particular architec-tures, such(More)
—In fully distributed networks data mining is an important tool for monitoring, control, and for offering person-alized services to users. The underlying data model can change as a function of time according to periodic (daily, weakly) patterns, sudden changes, or long term transformations of the environment or the system itself. For a large space of the(More)
Low-rank matrix approximation is an important tool in data mining with a wide range of applications including recommender systems, clustering, and identifying topics in documents. The problem we tackle is implementing singular value decomposition (SVD)-a popular method for low rank approximation in large fully distributed P2P systems in a robust and(More)
Applying sophisticated machine learning techniques on fully distributed data is increasingly important in many applications like distributed recommender systems or spam filters. In this type of networked environment the data model can change dynamically over time (concept drift). Identifying when concept drift occurred is a key for several drift handling(More)
—Peer-to-peer file-sharing has been increasingly popular in the last decade. In most cases file-sharing communities provide only minimal functionality, such as search and download. Extra features such as recommendation are difficult to implement because users are typically unwilling to provide sufficient rating information for the items they download. For(More)
One of the most fundamental data processing approach is the clustering. This is even true in distributed architectures. Here, we focus on the problem of designing efficient and fast K-Means approaches which work in fully distributed, asynchronous networks without any central control. We assume that the network has a huge number of computational units (even(More)