# An improved column generation algorithm for minimum sum-of-squares clustering

@article{Aloise2012AnIC, title={An improved column generation algorithm for minimum sum-of-squares clustering}, author={Daniel Aloise and Pierre Hansen and Leo Liberti}, journal={Mathematical Programming}, year={2012}, volume={131}, pages={195-220} }

Given a set of entities associated with points in Euclidean space, minimum sum-of-squares clustering (MSSC) consists in partitioning this set into clusters such that the sum of squared distances from each point to the centroid of its cluster is minimized. A column generation algorithm for MSSC was given by du Merle et al. in SIAM Journal Scientific Computing 21:1485–1505. The bottleneck of that algorithm is the resolution of the auxiliary problem of finding a column with negative reduced cost…

## 81 Citations

Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering

- Computer ScienceJ. Glob. Optim.
- 2011

A reformulation-linearization based branch-and-bound algorithm for minimum sum-of-squares clustering, claiming to solve instances with up to 1,000 points, is investigated in further detail, reproducing some of their computational experiments.

J-means and I-means for minimum sum-of-squares clustering on networks

- Computer ScienceOptim. Lett.
- 2017

Experimental results indicate that the implemented VNS-based heuristic for solving the Edge minimum sum-of-squares clustering problem produces the best known results in the literature.

Variable neighborhood search for minimum sum-of-squares clustering on networks

- Computer ScienceEur. J. Oper. Res.
- 2013

An iterative algorithm for the solution of very large-scale diameter clustering problems

- Computer Science
- 2015

An iterative algorithm for the solution of the diameter minimization clustering problem (DMCP) that can solve problems containing almost 600,000 entities while consuming only moderate amounts of time and memory is introduced.

Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections

- Computer Science, MathematicsJ. Glob. Optim.
- 2022

It is shown that when side constraints make k-means inapplicable, the proposed methodology—which is easy and fast to implement and deploy—can obtain good solutions in limited amounts of time.

An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering

- Computer ScienceArXiv
- 2021

This paper presents a new branch-and-bound algorithm for semi-supervised MSSC, where background knowledge is incorporated as pairwise must-link and cannot-link constraints, and efficiently manages to solve real-world instances up to 800 data points with different combinations of must- Links and Cannot- Links.

Constrained Minimum Sum of Squares Clustering by Constraint Programming

- Computer ScienceCP
- 2015

Experiments on classic datasets show that the proposed global optimization constraint for the Within-Cluster Sum of Squares criterion outperforms the exact approach based on integer linear programming and column generation.

Improving spectral bounds for clustering problems by Lagrangian relaxation

- Computer ScienceInt. Trans. Oper. Res.
- 2011

This paper investigates how to tighten the spectral bounds by using Lagrangian relaxation and Subgradient optimization methods.

Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem

- Computer Science
- 2018

In order to show how these local searches can be implemented within a metaheuristic framework, the new heuristics are applied in the local improvement step of two variable neighborhood search (VNS) procedures.

A Scalable Deterministic Global Optimization Algorithm for Clustering Problems

- Computer ScienceICML
- 2021

This paper modelled the MSSC task as a two-stage optimization problem and proposed a tailed reduced-space branch and bound (BB) algorithm that only needs to perform branching on the centers of clusters to guarantee convergence and can be scalable to the dataset with up to 200,000 samples.

## References

SHOWING 1-10 OF 85 REFERENCES

Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering

- Computer ScienceJ. Glob. Optim.
- 2011

A reformulation-linearization based branch-and-bound algorithm for minimum sum-of-squares clustering, claiming to solve instances with up to 1,000 points, is investigated in further detail, reproducing some of their computational experiments.

An Interior Point Algorithm for Minimum Sum-of-Squares Clustering

- Computer ScienceSIAM J. Sci. Comput.
- 1999

An exact algorithm is proposed for minimum sum-of-squares nonhierarchical clustering, i.e., for partitioning a given set of points from a Euclidean m-space into a given number of clusters in order to…

A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering

- Computer Science
- 2008

A branch-and-cut algorithm is proposed for the underlying 0-1 SDP model that obtains exact solutions for fairly large data sets with computing times comparable with those of the best exact method found in the literature.

Heuristic Methods for Large Centroid Clustering Problems

- Computer ScienceJ. Heuristics
- 2003

New heuristic methods for solving a class of hard centroid clustering problems including the p-median, the sum-of-squares clustering and the multi-source Weber problems are presented.

Analysis of Global k-Means, an Incremental Heuristic for Minimum Sum-of-Squares Clustering

- MathematicsJ. Classif.
- 2005

It is shown that global k-means cannot be guaranteed to find the optimum partition for any M ≥ 2 and d > 1; moreover, the same holds for all M > 3 if the new cluster center is chosen anywhere in R d instead of belonging to X.

Modified global k-means algorithm for minimum sum-of-squares clustering problems

- Computer SciencePattern Recognit.
- 2008

A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning

- Computer SciencePsychometrika
- 2006

A new branch-and-bound algorithm for minimizing WCSS is presented that provides optimal solutions for problems with up to 240 objects and eight well-separated clusters and was successfully applied to three empirical data sets from the classification literature.

J-MEANS: a new local search heuristic for minimum sum of squares clustering

- Computer SciencePattern Recognit.
- 2001

Coordination of Cluster Ensembles via Exact Methods

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2011

A novel optimization-based method for the combination of cluster ensembles for the class of problems with intracluster criteria, such as Minimum-Sum-of-Squares-Clustering (MSSC), which is inspired from a Set-Partitioning formulation of the original clustering problem.

A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2006

A genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k and finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.