Recombinator-k-Means: An Evolutionary Algorithm That Exploits k-Means++ for Recombination

  title={Recombinator-k-Means: An Evolutionary Algorithm That Exploits k-Means++ for Recombination},
  author={Carlo Baldassi},
  journal={IEEE Transactions on Evolutionary Computation},
  • Carlo Baldassi
  • Published 1 May 2019
  • Computer Science
  • IEEE Transactions on Evolutionary Computation
We introduce an evolutionary algorithm called recombinator-<inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-means for optimizing the highly nonconvex kmeans problem. Its defining feature is that its crossover step involves all the members of the current generation, stochastically recombining them with a repurposed variant of the <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>-means++ seeding algorithm. The recombination also uses a… 

Figures and Tables from this paper

Systematically and efficiently improving $k$-means initialization by pairwise-nearest-neighbor smoothing

A meta-method for initializing (seeding) the k -means clustering algorithm called PNN-smoothing, which consists in splitting a given dataset into J random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method.

Research on Improvement Strategy of Clustering Algorithm Based on Density Parameter Optimization Algorithm

  • Chuan Wu
  • Computer Science
    2022 IEEE 2nd International Conference on Data Science and Computer Application (ICDSCA)
  • 2022
This paper proposes a clustering algorithm improvement strategy based on the density parameter optimization algorithm that optimizes the clustering method and clustering characteristics of the clustered algorithm to be better applied to the data analysis process.

Parallel Random Swap: An Efficient and Reliable Clustering Algorithm in Java

Optimization method of edge computing terminal deployment considering node division in Electric Internet of Things

An optimization method for the deployment of edge computing terminals considering node division in the Electric Internet of Things, aiming at minimizing the delay when processing tasks is proposed.

An efficient annealing-assisted differential evolution for multi-parameter adaptive latent factor analysis

Experimental results of both adaptive and non-adaptive state-of-the-art methods on industrial HDI datasets illustrate that ADMA achieves a desirable global optimum with reasonable overhead and prevails competing methods in terms of predicting the missing data in HDI matrices.



Genetic algorithm with deterministic crossover for vector quantization

  • P. Fränti
  • Computer Science
    Pattern Recognit. Lett.
  • 2000

k-means++: the advantages of careful seeding

By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.

How much k-means can be improved by using better initialization and repeats? Pattern Recognition, 2019

  • 2019

Efficiency of random swap clustering

The main results are that the expected time complexity of the random swap algorithm has (1) linear dependency on the number of data vectors, (2) quadratic dependency onThe number of clusters, and (3) inverse dependent on the size of neighborhood.

Multiparent recombination in evolutionary computing

A survey of multiparent operators that have been introduced over the years in evolutionary computing is given and the traditional mutation-or-crossover debate is reformulate in the light of such operators.

A study on the effect of multi-parent recombination in real coded genetic algorithms

  • S. TsutsuiA. Ghosh
  • Computer Science
    1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360)
  • 1998
The results showed clearly that multi-parent recombinations lead to better performance, although the performance improvement for different techniques were found to be dependent on problems.

K-means properties on six clustering benchmark datasets

The results show that overlap is critical, and that k-means starts to work effectively when the overlap reaches 4% level.

Sparse Embedded k-Means Clustering

A sparse embedded $k-means clustering algorithm which requires $\mathcal{O}(nnz(X))$ ($nnz (X)$ denotes the number of non-zeros in $X$) for fast matrix multiplication and improves on [1]'s results for approximation accuracy by a factor of one.

Reduced comparison search for the exact GLA

A new method for reducing the number of distance calculations in the generalized Lloyd algorithm (GLA), which is a widely used method to construct a codebook in vector quantization, that detects the activity of the code vectors and utilizes it on the classification of the training vectors.