Clustering by Passing Messages Between Data Points

@article{Frey2007ClusteringBP,
  title={Clustering by Passing Messages Between Data Points},
  author={Brendan J. Frey and Delbert Dueck},
  journal={Science},
  year={2007},
  volume={315},
  pages={972 - 976}
}
Clustering data by identifying a subset of representative examples is important for processing sensory signals and detecting patterns in data. Such “exemplars” can be found by randomly choosing an initial subset of data points and then iteratively refining it, but this works well only if that initial choice is close to a good solution. We devised a method called “affinity propagation,” which takes as input measures of similarity between pairs of data points. Real-valued messages are exchanged… Expand
Clustering by propagating probabilities between data points
TLDR
A graph-based clustering algorithm called "probability propagation," which is able to identify clusters having spherical shapes as well as clusters having non-spherical shapes, is proposed. Expand
Affinity Propagation: Clustering Data by Passing Messages
TLDR
This thesis describes a method called “affinity propagation” that simultaneously considers all data points as potential exemplars, exchanging real-valued messages between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. Expand
Local and global approaches of affinity propagation clustering for large scale data
TLDR
Two variants of AP for grouping large scale data with a dense similarity matrix are presented, the local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagate (LAP). Expand
Sparse Affinity Propagation for Image Analysis
TLDR
An algorithm, named as Sparse Affinity Propagation (SAP), which adopts sparse representation coefficient to depict the relationship among data points and is superior to AP and other baseline algorithms for image analysis in accuracy and robustness. Expand
Clustering by fast search and find of density peaks
TLDR
A method in which the cluster centers are recognized as local density maxima that are far away from any points of higher density, and the algorithm depends only on the relative densities rather than their absolute values. Expand
An improved affinity propagation clustering algorithm for large-scale data sets
TLDR
The experimental results show that, compared with the traditional AP and adaptive AP algorithm, the HAP algorithm can greatly reduce the clustering time consumption with a relatively better clustering results. Expand
A hierarchical clustering algorithm based on noise removal
TLDR
A Hierarchical Clustering algorithm Based on Noise Removal (HCBNR) that is robust against noise points and good at discovering clusters with arbitrary shapes is presented. Expand
Beyond affinity propagation: message passing algorithms for clustering
TLDR
This thesis develops several extensions of affinity propagation that provide clustering tools that go beyond the capabilities of the basic affinity propagation algorithm, and generalize it to various problems of interest in machine learning. Expand
Clustering of Categorical Data for Anonymization and Anomaly Detection
The field of data analysis has exploded in recent years thanks to the huge wealth of information that can be collected through the Internet and other similar ways of gathering data. Machine learningExpand
Clustering for point pattern data
TLDR
This paper proposes two approaches for clustering point patterns, a non-parametric method based on novel distances for sets and a model-based approach, formulated via random finite set theory and solved by the Expectation-Maximization algorithm. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 48 REFERENCES
Some methods for classification and analysis of multivariate observations
The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to giveExpand
Normalized cuts and image segmentation
  • Jianbo Shi, J. Malik
  • Mathematics, Computer Science
  • Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition
  • 1997
TLDR
This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups. Expand
A Constant-Factor Approximation Algorithm for the k-Median Problem
TLDR
This work presents the first constant-factor approximation algorithm for the metric k-median problem, and improves upon the best previously known result of O(log k log log log k), which was obtained by refining and derandomizing a randomized O( log n log log n)-approximation algorithm of Bartal. Expand
Factor graphs and the sum-product algorithm
TLDR
A generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph, that computes-either exactly or approximately-various marginal functions derived from the global function. Expand
Constructing free-energy approximations and generalized belief propagation algorithms
TLDR
This work explains how to obtain region-based free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms, and describes empirical results showing that GBP can significantly outperform BP. Expand
Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs
TLDR
Most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. Expand
A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model
This article proposes a split-merge Markov chain algorithm to address the problem of inefficient sampling for conjugate Dirichlet process mixture models. Traditional Markov chain Monte Carlo methodsExpand
Neural networks and physical systems with emergent collective computational abilities.
  • J. Hopfield
  • Computer Science, Medicine
  • Proceedings of the National Academy of Sciences of the United States of America
  • 1982
TLDR
A model of a system having a large number of simple equivalent components, based on aspects of neurobiology but readily adapted to integrated circuits, produces a content-addressable memory which correctly yields an entire memory from any subpart of sufficient size. Expand
NCBI Reference Sequence Project: update and current status
The goal of the NCBI Reference Sequence (RefSeq) project is to provide the single best non-redundant and comprehensive collection of naturally occurring biological molecules, representing the centralExpand
Good Error-Correcting Codes Based on Very Sparse Matrices
  • D. Mackay
  • Mathematics, Computer Science
  • IEEE Trans. Inf. Theory
  • 1999
TLDR
It is proved that sequences of codes exist which, when optimally decoded, achieve information rates up to the Shannon limit, and experimental results for binary-symmetric channels and Gaussian channels demonstrate that practical performance substantially better than that of standard convolutional and concatenated codes can be achieved. Expand
...
1
2
3
4
5
...