• Corpus ID: 85458947

A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints

@article{Mishchenko2018ASP,
  title={A Stochastic Penalty Model for Convex and Nonconvex Optimization with Big Constraints},
  author={Konstantin Mishchenko and Peter Richt{\'a}rik},
  journal={arXiv: Optimization and Control},
  year={2018}
}
The last decade witnessed a rise in the importance of supervised learning applications involving {\em big data} and {\em big models}. Big data refers to situations where the amounts of training data available and needed causes difficulties in the training phase of the pipeline. Big model refers to situations where large dimensional and over-parameterized models are needed for the application at hand. Both of these phenomena lead to a dramatic increase in research activity aimed at taming the… 

Figures and Tables from this paper

Almost surely constrained convex optimization

TLDR
The proposed stochastic gradient framework uses smoothing and homotopy techniques to handle constraints without the need for matrix-valued projections, and shows that the algorithm achieves state-of-the-art practical performance.

New Penalized Stochastic Gradient Methods for Linearly Constrained Strongly Convex Optimization

TLDR
The nested structure of the algorithm and upper bounds on the distance to the optimal solutions allows one to safely eliminate constraints that are inactive at an optimal solution throughout the algorithm, which leads to improved complexity results.

A Stochastic Decoupling Method for Minimizing the Sum of Smooth and Non-Smooth Functions

TLDR
A variance reduced method which is able progressively learn the proximal operator of $g$ via the computation of the proxies of a single randomly selected function in each iteration only, and achieves a linear rate for the problem of minimizing a strongly convex function f under linear constraints under no assumption on the constraints beyond consistency.

A Self-supervised Approach to Hierarchical Forecasting with Applications to Groupwise Synthetic Controls

TLDR
A new loss function is proposed that can be incorporated into any maximum likelihood objective with hierarchical data, resulting in reconciled estimates with confidence intervals that correctly account for additional uncertainty due to imperfect reconciliation.

Sinkhorn Algorithm as a Special Case of Stochastic Mirror Descent

TLDR
A new perspective on the celebrated Sinkhorn algorithm is presented by showing that is a special case of incremental/stochastic mirror descent, and the discovered equivalence allows us to propose new methods for optimal transport, an extension of S sinkhorn algorithm beyond two constraints.

References

SHOWING 1-10 OF 32 REFERENCES

Convex Optimization over Intersection of Simple Sets: improved Convergence Rate Guarantees via an Exact Penalty Approach

TLDR
This work considers the problem of minimizing a convex function over the intersection of finitely many simple sets which are easy to project onto and derives first-order algorithms which improve the complexity to O( 1/\varepsilon) and $O(1/\sqrt{\varpsilon})$ for smooth functions.

SGD and Hogwild! Convergence Without the Bounded Gradients Assumption

TLDR
It is shown that for stochastic problems arising in machine learning such bound always holds; and an alternative convergence analysis of SGD with diminishing learning rate regime is proposed, which results in more relaxed conditions than those in (Bottou et al.,2016).

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

TLDR
ProxASAGA is proposed, a fully asynchronous sparse method inspired by SAGA, a variance reduced incremental gradient algorithm that achieves a theoretical linear speedup with respect to the sequential version under assumptions on the sparsity of gradients and block-separability of the proximal term.

Proximal Alternating Penalty Algorithms for Constrained Convex Optimization

TLDR
Two new proximal alternating penalty algorithms to solve a wide range class of constrained convex optimization problems with the best-known $\BigO{1/k}$-convergence rate in the non-ergodic sense, where $k$ is the iteration counter.

Parallel coordinate descent methods for big data optimization

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex

Randomized projection methods for convex feasibility problems: conditioning and convergence rates

TLDR
A general random projection algorithmic framework, which extends to the random settings many existing projection schemes, designed for the general convex feasibility problem, and which allows to project simultaneously on several sets, thus providing great flexibility in matching the implementation of the algorithm on the parallel architecture at hand.

Importance Sampling for Minibatches

TLDR
This paper proposes the first {\em importance sampling for minibatches} and gives simple and rigorous complexity analysis of its performance and illustrates on synthetic problems that for training data of certain properties, the sampling can lead to several orders of magnitude improvement in training time.

Stochastic Dual Ascent for Solving Linear Systems

TLDR
It is proved that primal iterates associated with the dual process converge to the projection exponentially fast in expectation, and the same rate applies to dual function values, primal function values and the duality gap.

Accelerated, Parallel, and Proximal Coordinate Descent

TLDR
A new randomized coordinate descent method for minimizing the sum of convex functions each of which depends on a small number of coordinates only, which can be implemented without the need to perform full-dimensional vector operations, which is the major bottleneck of accelerated coordinate descent.

Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems

  • Y. Nesterov
  • Computer Science, Mathematics
    SIAM J. Optim.
  • 2012
TLDR
Surprisingly enough, for certain classes of objective functions, the proposed methods for solving huge-scale optimization problems are better than the standard worst-case bounds for deterministic algorithms.