# Gradient methods for minimizing composite functions

@article{Nesterov2013GradientMF, title={Gradient methods for minimizing composite functions}, author={Yurii Nesterov}, journal={Mathematical Programming}, year={2013}, volume={140}, pages={125-161} }

In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two terms: one is smooth and given by a black-box oracle, and another is a simple general convex function with known structure. Despite the absence of good properties of the sum, such problems, both in convex and nonconvex cases, can be solved with efficiency typical for the first part of the objective. For convex problems of the above structure, we consider primal and…

## 1,130 Citations

### An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization

- Computer Science, MathematicsComput. Optim. Appl.
- 2015

An accelerated proximal gradient method is presented for problems where the smooth part of the objective function is also strongly convex, and this method incorporates an efficient line-search procedure, and achieves the optimal iteration complexity for such composite optimization problems.

### Accelerated Regularized Newton Methods for Minimizing Composite Convex Functions

- Mathematics, Computer ScienceSIAM J. Optim.
- 2019

In this paper, we study accelerated Regularized Newton Methods for minimizing objectives formed as a sum of two functions: one is convex and twice differentiable with Holder-continuous Hessian, and…

### A FISTA-type accelerated gradient algorithm for solving smooth nonconvex composite optimization problems

- Mathematics, Computer ScienceComput. Optim. Appl.
- 2021

In this paper, we describe and establish iteration-complexity of two accelerated composite gradient (ACG) variants to solve a smooth nonconvex composite optimization problem whose objective function…

### Efficiency of minimizing compositions of convex functions and smooth maps

- Computer Science, MathematicsMath. Program.
- 2019

It is shown that when the subproblems can only be solved by first-order methods, a simple combination of smoothing, the prox-linear method, and a fast-gradient scheme yields an algorithm with complexity, akin to gradient descent for smooth minimization.

### Accelerated inexact composite gradient methods for nonconvex spectral optimization problems

- Computer Science, MathematicsComput. Optim. Appl.
- 2022

Two inexact composite gradient methods are presented, one inner accelerated and another doubly accelerated, for solving a class of nonconvex spectral composite optimization problems and take advantage of both the composite and spectral structure underlying the objective function in order to efficiently generate their solutions.

### Primal-dual fast gradient method with a model

- Computer Science, Mathematics
- 2019

The main idea is the following: to find a dual solution to an approximation of a primal problem using the conception of $(\delta, L)$-model, the principle of "divide and conquer" is realized.

### Complexity bounds for primal-dual methods minimizing the model of objective function

- Mathematics, Computer ScienceMath. Program.
- 2018

This work provides Frank–Wolfe method with a convergence analysis allowing to approach a primal-dual solution of convex optimization problem with composite objective function and justifies a new variant of this method, which can be seen as a trust-region scheme applying to the linear model of objective function.

### MOCCA: Mirrored Convex/Concave Optimization for Nonconvex Composite Functions

- Computer ScienceJ. Mach. Learn. Res.
- 2016

The MOCCA (mirrored convex/concave) algorithm is proposed, a primal/dual optimization approach that takes a local convex approximation to each term at every iteration, and offers theoretical guarantees for convergence when the overall problem is approximately convex.

### Augmented Lagrangian based first-order methods for convex and nonconvex programs: nonergodic convergence and iteration complexity

- Computer ScienceArXiv
- 2020

A nonergodic convergence rate result of an augmented Lagrangian (AL) based FOM for convex problems with functional constraints is established and a novel AL-based FOM is designed for problems with non-convex objective and convex constraint functions.

### On Convergence Rates of Linearized Proximal Algorithms for Convex Composite Optimization with Applications

- Mathematics, Computer ScienceSIAM J. Optim.
- 2016

Under the assumptions of local weak sharp minima of order $p$ ($p \in [1,2]$) and a quasi-regularity condition, a local superlinear convergence rate is established for the linearized proximal algorithm (LPA).

## References

SHOWING 1-10 OF 25 REFERENCES

### Gradient methods for minimizing composite objective function

- Computer Science, Mathematics
- 2007

In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum of two convex terms: one is smooth and given by a black-box oracle, and…

### Accelerating the cubic regularization of Newton’s method on convex problems

- MathematicsMath. Program.
- 2008

An accelerated version of the cubic regularization of Newton’s method that converges for the same problem class with order, keeping the complexity of each iteration unchanged and arguing that for the second-order schemes, the class of non-degenerate problems is different from the standard class.

### Rounding of convex sets and efficient gradient methods for linear programming problems

- Mathematics, Computer ScienceOptim. Methods Softw.
- 2008

It is proved that the upper complexity bound for both schemes is O((√(n ln m)/δ)ln n) iterations of a gradient-type method, where n and m are the sizes of the corresponding linear programming problems.

### A generalized proximal point algorithm for certain non-convex minimization problems

- Mathematics, Computer Science
- 1981

This algorithm may be viewed as a generalization of the proximal point algorithm to cope with non-convexity of the objective function by linearizing the differentiable term at each iteration.

### Introductory Lectures on Convex Optimization - A Basic Course

- Computer ScienceApplied Optimization
- 2004

It was in the middle of the 1980s, when the seminal paper by Kar markar opened a new epoch in nonlinear optimization, and it became more and more common that the new methods were provided with a complexity analysis, which was considered a better justification of their efficiency than computational experiments.

### Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems

- Computer ScienceIEEE Journal of Selected Topics in Signal Processing
- 2007

This paper proposes gradient projection algorithms for the bound-constrained quadratic programming (BCQP) formulation of these problems and test variants of this approach that select the line search parameters in different ways, including techniques based on the Barzilai-Borwein method.

### Smooth minimization of non-smooth functions

- Computer ScienceMath. Program.
- 2005

A new approach for constructing efficient schemes for non-smooth convex optimization is proposed, based on a special smoothing technique, which can be applied to functions with explicit max-structure, and can be considered as an alternative to black-box minimization.

### Just relax: convex programming methods for identifying sparse signals in noise

- Computer ScienceIEEE Transactions on Information Theory
- 2006

A method called convex relaxation, which attempts to recover the ideal sparse signal by solving a convex program, which can be completed in polynomial time with standard scientific software.

### Iterative solution of nonlinear equations in several variables

- MathematicsComputer science and applied mathematics
- 1970

Convergence of Minimization Methods An Annotated List of Basic Reference Books Bibliography Author Index Subject Index.

### Atomic Decomposition by Basis Pursuit

- Computer ScienceSIAM J. Sci. Comput.
- 1998

Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.