• Corpus ID: 12408253

Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems

  title={Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems},
  author={Maria-Florina Balcan and Vaishnavh Nagarajan and Ellen Vitercik and Colin White},
  booktitle={Annual Conference Computational Learning Theory},
Max-cut, clustering, and many other partitioning problems that are of significant importance to machine learning and other scientific fields are NP-hard, a reality that has motivated researchers to develop a wealth of approximation algorithms and heuristics. [] Key Method Our algorithms learn over common integer quadratic programming and clustering algorithm families: SDP rounding algorithms and agglomerative clustering algorithms with dynamic programming. For our sample complexity analysis, we provide tight…

Data-driven Algorithm Design

  • M. Balcan
  • Computer Science
    Beyond the Worst-Case Analysis of Algorithms
  • 2020
This chapter surveys recent work that helps put data-driven combinatorial algorithm design on firm foundations and provides strong computational and statistical performance guarantees, both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion.

Data-Driven Clustering via Parameterized Lloyd's Families

An infinite family of algorithms generalizing Lloyd's algorithm is defined, which includes the celebrated k-means++ algorithm, as well as the classic farthest-first traversal algorithm.

Learning to Branch

It is shown how to use machine learning to determine an optimal weighting of any set of partitioning procedures for the instance distribution at hand using samples from the distribution, and it is proved that this reduction can even be exponential.

Faster algorithms for learning to link, align sequences, and price two-part tariffs

This work provides algorithms for efficient (output-polynomial) multidimensional parameter tuning, i.e. for families with a small constant number of parameters, for three very different combinatorial problems — linkage-based clustering, dynamic programming for sequence alignment, and auction design for two-part tariff schemes.

Technical perspective: Algorithm selection as a learning problem

  • A. Blum
  • Computer Science
    Commun. ACM
  • 2020
This work identifies a new notion called dispersion that enables positive results in principled data-driven algorithm design and hyperparameters for many popular clustering methods.

Dispersion for Data-Driven Algorithm Design, Online Learning, and Private Optimization

This work provides upper and lower bounds on regret for algorithm selection in online settings, and presents general techniques for optimizing the sum or average of piecewise Lipschitz functions when the underlying functions satisfy a sufficient and general condition called dispersion.

Learning to Link

This work designs efficient learning algorithms which receive samples from an application-specific distribution over clustering instances and simultaneously learn both a near-optimal distance and clustering algorithm from these classes, and carries out a comprehensive empirical evaluation of these techniques.

Improved Learning Bounds for Branch-and-Cut

This paper proves sample complexity guarantees for this procedure, which bound how large the training set should be to ensure that for any configuration, its average performance over theTraining set is close to its expected future performance.

Procrastinating with Confidence: Near-Optimal, Anytime, Adaptive Algorithm Configuration

A new algorithm is introduced that preserves the near-optimality and anytime properties of Structured Procrastination while adding adaptivity, and will perform dramatically faster in settings where many algorithm configurations perform poorly.

New Aspects of Beyond Worst-Case Analysis

This thesis designs efficient algorithms that output optimal or near-optimal clusterings for the canonical k-center objective under perturbation resilience, and proposes data-dependent dispatching algorithms which cast the problem as clustering with important balance and fault-tolerance conditions.



The Design of Approximation Algorithms

This book shows how to design approximation algorithms: efficient algorithms that find provably near-optimal solutions to discrete optimization problems.

Center-based clustering under perturbation stability

Empirical hardness models: Methodology and a case study on combinatorial auctions

The use of supervised machine learning is proposed to build models that predict an algorithm's runtime given a problem instance and techniques for interpreting them are described to gain understanding of the characteristics that cause instances to be hard or easy.

Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming

This algorithm gives the first substantial progress in approximating MAX CUT in nearly twenty years, and represents the first use of semidefinite programming in the design of approximation algorithms.

Clustering under Perturbation Resilience

This paper presents an algorithm that can optimally cluster instances resilient to $(1 + \sqrt{2})$-factor perturbations, solving an open problem of Awasthi et al.

A Meta-Heuristic Factory for Vehicle Routing Problems

A combination of meta-heuristics that yields new best-known results on the Solomon benchmarks are demonstrated, and a method to automatically adjust this combination to handle problems with different sizes, complexity and optimization objectives is provided.

Improved Analysis of Complete-Linkage Clustering

It is proved that the complete-linkage method computes an O(1)-approximation for this problem for any metric that is induced by a norm, assuming that the dimension d is a constant.

Simpler Analyses of Local Search Algorithms for Facility Location

A proof of the $k-median result which avoids the ``coupling'' argument and can be used in other settings where the Arya et al. arguments have been used.

k-center Clustering under Perturbation Resilience

This work provides strong positive results both for the asymmetric and symmetric k-center problems under a natural input stability (promise) condition called α-perturbation resilience and provides algorithms that give strong guarantees simultaneously for stable and non-stable instances.

Incremental clustering and dynamic information retrieval

This work considers the problem of clustering dynamic point sets in a metric space and proposes a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications.