# Clustering by Passing Messages Between Data Points

@article{Frey2007ClusteringBP, title={Clustering by Passing Messages Between Data Points}, author={Brendan J. Frey and Delbert Dueck}, journal={Science}, year={2007}, volume={315}, pages={972 - 976} }

Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged…

## 6,268 Citations

### Affinity Propagation: Clustering Data by Passing Messages

- Computer Science
- 2009

This thesis describes a method called “affinity propagation” that simultaneously considers all data points as potential exemplars, exchanging real-valued messages between data points until a high-quality set of exemplars and corresponding clusters gradually emerges.

### Local and global approaches of affinity propagation clustering for large scale data

- Computer ScienceArXiv
- 2009

Two variants of AP for grouping large scale data with a dense similarity matrix are presented, the local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagate (LAP).

### Sparse Affinity Propagation for Image Analysis

- Computer ScienceJ. Softw.
- 2014

An algorithm, named as Sparse Affinity Propagation (SAP), which adopts sparse representation coefficient to depict the relationship among data points and is superior to AP and other baseline algorithms for image analysis in accuracy and robustness.

### Clustering by fast search and find of density peaks

- Computer ScienceScience
- 2014

A method in which the cluster centers are recognized as local density maxima that are far away from any points of higher density, and the algorithm depends only on the relative densities rather than their absolute values.

### An improved affinity propagation clustering algorithm for large-scale data sets

- Computer Science2013 Ninth International Conference on Natural Computation (ICNC)
- 2013

The experimental results show that, compared with the traditional AP and adaptive AP algorithm, the HAP algorithm can greatly reduce the clustering time consumption with a relatively better clustering results.

### Beyond Affinity Propagation: Message Passing Algorithms for Clustering

- Computer Science
- 2012

This thesis develops several extensions of affinity propagation that provide clustering tools that go beyond the capabilities of the basic affinity propagation algorithm, and generalize it to various problems of interest in machine learning.

### A hierarchical clustering algorithm based on noise removal

- Computer ScienceInt. J. Mach. Learn. Cybern.
- 2019

A Hierarchical Clustering algorithm Based on Noise Removal (HCBNR) that is robust against noise points and good at discovering clusters with arbitrary shapes is presented.

### A hierarchical clustering algorithm based on noise removal

- Computer ScienceInternational Journal of Machine Learning and Cybernetics
- 2018

A Hierarchical Clustering algorithm Based on Noise Removal (HCBNR) that is robust against noise points and good at discovering clusters with arbitrary shapes is presented.

### Clustering of Categorical Data for Anonymization and Anomaly Detection

- Computer Science
- 2018

To help the algorithm with handling homogeneous data I designed new versions of f(θ), a function that is used as a criterion for choosing the best clusters to merge, to prevent ROCK from generating very large clusters and improved the execution time in some cases, while improving the results in general.

## 38 References

### Some methods for classification and analysis of multivariate observations

- Mathematics
- 1967

The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give…

### Normalized cuts and image segmentation

- Computer ScienceProceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition
- 1997

This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.

### A Constant-Factor Approximation Algorithm for the k-Median Problem

- Computer Science, MathematicsJ. Comput. Syst. Sci.
- 2002

This work presents the first constant-factor approximation algorithm for the metric k-median problem, and improves upon the best previously known result of O(log k log log log k), which was obtained by refining and derandomizing a randomized O( log n log log n)-approximation algorithm of Bartal.

### Factor graphs and the sum-product algorithm

- Computer ScienceIEEE Trans. Inf. Theory
- 2001

A generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph, that computes-either exactly or approximately-various marginal functions derived from the global function.

### Constructing free-energy approximations and generalized belief propagation algorithms

- Computer ScienceIEEE Transactions on Information Theory
- 2005

This work explains how to obtain region-based free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms, and describes empirical results showing that GBP can significantly outperform BP.

### Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs

- BiologyNature Genetics
- 2005

Most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified.

### A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model

- Computer Science
- 2004

A split-merge Markov chain algorithm is proposed to address the problem of inefficient sampling for conjugate Dirichlet process mixture models by employing a new technique in which an appropriate proposal for splitting or merging components is obtained by using a restricted Gibbs sampling scan.

### Neural networks and physical systems with emergent collective computational abilities.

- Computer ScienceProceedings of the National Academy of Sciences of the United States of America
- 1982

A model of a system having a large number of simple equivalent components, based on aspects of neurobiology but readily adapted to integrated circuits, produces a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size.

### NCBI Reference Sequence Project: update and current status

- BiologyNucleic Acids Res.
- 2003

The goal of the NCBI Reference Sequence (RefSeq) project is to provide the single best non-redundant and comprehensive collection of naturally occurring biological molecules, representing the central…

### Elements of Information Theory

- Computer Science
- 1991

The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.