Clustering by Passing Messages Between Data Points

  title={Clustering by Passing Messages Between Data Points},
  author={Brendan J. Frey and Delbert Dueck},
  pages={972 - 976}
Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged… 

Affinity Propagation: Clustering Data by Passing Messages

This thesis describes a method called “affinity propagation” that simultaneously considers all data points as potential exemplars, exchanging real-valued messages between data points until a high-quality set of exemplars and corresponding clusters gradually emerges.

Local and global approaches of affinity propagation clustering for large scale data

Two variants of AP for grouping large scale data with a dense similarity matrix are presented, the local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagate (LAP).

Sparse Affinity Propagation for Image Analysis

An algorithm, named as Sparse Affinity Propagation (SAP), which adopts sparse representation coefficient to depict the relationship among data points and is superior to AP and other baseline algorithms for image analysis in accuracy and robustness.

Clustering by fast search and find of density peaks

A method in which the cluster centers are recognized as local density maxima that are far away from any points of higher density, and the algorithm depends only on the relative densities rather than their absolute values.

An improved affinity propagation clustering algorithm for large-scale data sets

The experimental results show that, compared with the traditional AP and adaptive AP algorithm, the HAP algorithm can greatly reduce the clustering time consumption with a relatively better clustering results.

A hierarchical clustering algorithm based on noise removal

A Hierarchical Clustering algorithm Based on Noise Removal (HCBNR) that is robust against noise points and good at discovering clusters with arbitrary shapes is presented.

A hierarchical clustering algorithm based on noise removal

A Hierarchical Clustering algorithm Based on Noise Removal (HCBNR) that is robust against noise points and good at discovering clusters with arbitrary shapes is presented.

Clustering of Categorical Data for Anonymization and Anomaly Detection

To help the algorithm with handling homogeneous data I designed new versions of f(θ), a function that is used as a criterion for choosing the best clusters to merge, to prevent ROCK from generating very large clusters and improved the execution time in some cases, while improving the results in general.

Clustering for point pattern data

This paper proposes two approaches for clustering point patterns, a non-parametric method based on novel distances for sets and a model-based approach, formulated via random finite set theory and solved by the Expectation-Maximization algorithm.



Some methods for classification and analysis of multivariate observations

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give

Normalized cuts and image segmentation

  • Jianbo ShiJ. Malik
  • Computer Science
    Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition
  • 1997
This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.

A Constant-Factor Approximation Algorithm for the k-Median Problem

This work presents the first constant-factor approximation algorithm for the metric k-median problem, and improves upon the best previously known result of O(log k log log log k), which was obtained by refining and derandomizing a randomized O( log n log log n)-approximation algorithm of Bartal.

Factor graphs and the sum-product algorithm

A generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph, that computes-either exactly or approximately-various marginal functions derived from the global function.

Constructing free-energy approximations and generalized belief propagation algorithms

This work explains how to obtain region-based free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms, and describes empirical results showing that GBP can significantly outperform BP.

Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs

Most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified.

A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model

A split-merge Markov chain algorithm is proposed to address the problem of inefficient sampling for conjugate Dirichlet process mixture models by employing a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan.

Neural networks and physical systems with emergent collective computational abilities.

  • J. Hopfield
  • Computer Science
    Proceedings of the National Academy of Sciences of the United States of America
  • 1982
A model of a system having a large number of simple equivalent components, based on aspects of neurobiology but readily adapted to integrated circuits, produces a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size.

NCBI Reference Sequence Project: update and current status

The goal of the NCBI Reference Sequence (RefSeq) project is to provide the single best non-redundant and comprehensive collection of naturally occurring biological molecules, representing the central

Elements of Information Theory

The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.