An improved column generation algorithm for minimum sum-of-squares clustering

@article{Aloise2012AnIC,
  title={An improved column generation algorithm for minimum sum-of-squares clustering},
  author={Daniel Aloise and Pierre Hansen and Leo Liberti},
  journal={Mathematical Programming},
  year={2012},
  volume={131},
  pages={195-220}
}
Given a set of entities associated with points in Euclidean space, minimum sum-of-squares clustering (MSSC) consists in partitioning this set into clusters such that the sum of squared distances from each point to the centroid of its cluster is minimized. A column generation algorithm for MSSC was given by du Merle et al. in SIAM Journal Scientific Computing 21:1485–1505. The bottleneck of that algorithm is the resolution of the auxiliary problem of finding a column with negative reduced cost… 
Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering
TLDR
A reformulation-linearization based branch-and-bound algorithm for minimum sum-of-squares clustering, claiming to solve instances with up to 1,000 points, is investigated in further detail, reproducing some of their computational experiments.
J-means and I-means for minimum sum-of-squares clustering on networks
TLDR
Experimental results indicate that the implemented VNS-based heuristic for solving the Edge minimum sum-of-squares clustering problem produces the best known results in the literature.
An iterative algorithm for the solution of very large-scale diameter clustering problems
TLDR
An iterative algorithm for the solution of the diameter minimization clustering problem (DMCP) that can solve problems containing almost 600,000 entities while consuming only moderate amounts of time and memory is introduced.
Side-constrained minimum sum-of-squares clustering: mathematical programming and random projections
TLDR
It is shown that when side constraints make k-means inapplicable, the proposed methodology—which is easy and fast to implement and deploy—can obtain good solutions in limited amounts of time.
An Exact Algorithm for Semi-supervised Minimum Sum-of-Squares Clustering
TLDR
This paper presents a new branch-and-bound algorithm for semi-supervised MSSC, where background knowledge is incorporated as pairwise must-link and cannot-link constraints, and efficiently manages to solve real-world instances up to 800 data points with different combinations of must- Links and Cannot- Links.
Constrained Minimum Sum of Squares Clustering by Constraint Programming
TLDR
Experiments on classic datasets show that the proposed global optimization constraint for the Within-Cluster Sum of Squares criterion outperforms the exact approach based on integer linear programming and column generation.
Improving spectral bounds for clustering problems by Lagrangian relaxation
TLDR
This paper investigates how to tighten the spectral bounds by using Lagrangian relaxation and Subgradient optimization methods.
Review of Basic Local Searches for Solving the Minimum Sum-of-Squares Clustering Problem
TLDR
In order to show how these local searches can be implemented within a metaheuristic framework, the new heuristics are applied in the local improvement step of two variable neighborhood search (VNS) procedures.
A Scalable Deterministic Global Optimization Algorithm for Clustering Problems
TLDR
This paper modelled the MSSC task as a two-stage optimization problem and proposed a tailed reduced-space branch and bound (BB) algorithm that only needs to perform branching on the centers of clusters to guarantee convergence and can be scalable to the dataset with up to 200,000 samples.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 85 REFERENCES
Evaluating a branch-and-bound RLT-based algorithm for minimum sum-of-squares clustering
TLDR
A reformulation-linearization based branch-and-bound algorithm for minimum sum-of-squares clustering, claiming to solve instances with up to 1,000 points, is investigated in further detail, reproducing some of their computational experiments.
An Interior Point Algorithm for Minimum Sum-of-Squares Clustering
An exact algorithm is proposed for minimum sum-of-squares nonhierarchical clustering, i.e., for partitioning a given set of points from a Euclidean m-space into a given number of clusters in order to
A branch-and-cut SDP-based algorithm for minimum sum-of-squares clustering
TLDR
A branch-and-cut algorithm is proposed for the underlying 0-1 SDP model that obtains exact solutions for fairly large data sets with computing times comparable with those of the best exact method found in the literature.
Heuristic Methods for Large Centroid Clustering Problems
TLDR
New heuristic methods for solving a class of hard centroid clustering problems including the p-median, the sum-of-squares clustering and the multi-source Weber problems are presented.
Analysis of Global k-Means, an Incremental Heuristic for Minimum Sum-of-Squares Clustering
TLDR
It is shown that global k-means cannot be guaranteed to find the optimum partition for any M ≥ 2 and d > 1; moreover, the same holds for all M > 3 if the new cluster center is chosen anywhere in R d instead of belonging to X.
Modified global k-means algorithm for minimum sum-of-squares clustering problems
A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning
TLDR
A new branch-and-bound algorithm for minimizing WCSS is presented that provides optimal solutions for problems with up to 240 objects and eight well-separated clusters and was successfully applied to three empirical data sets from the classification literature.
J-MEANS: a new local search heuristic for minimum sum of squares clustering
Coordination of Cluster Ensembles via Exact Methods
  • I. Christou
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2011
TLDR
A novel optimization-based method for the combination of cluster ensembles for the class of problems with intracluster criteria, such as Minimum-Sum-of-Squares-Clustering (MSSC), which is inspired from a Set-Partitioning formulation of the original clustering problem.
A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering
  • M. Laszlo, S. Mukherjee
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2006
TLDR
A genetic algorithm (GA) for evolving centers in the k-means algorithm that simultaneously identifies good partitions for a range of values around a specified k and finds the global optimum for data sets with known optima and finds good solutions for large simulated data sets.
...
1
2
3
4
5
...