• Corpus ID: 221949400

Half-Space Proximal Stochastic Gradient Method for Group-Sparsity Regularized Problem

@article{Chen2020HalfSpacePS,
  title={Half-Space Proximal Stochastic Gradient Method for Group-Sparsity Regularized Problem},
  author={Tianyi Chen and Guanyi Wang and Tianyu Ding and Bo Ji and Sheng Yi and Zhihui Zhu},
  journal={arXiv: Optimization and Control},
  year={2020}
}
Optimizing with group-sparsity is significant in enhancing model interpretation in machining learning applications, e.g., model compression. However, for large-scale training problems, fast convergence and effective group-sparsity exploration are hard to achieved simultaneously in stochastic settings. Particularly, existing state-of-the-art methods, e.g., Prox-SG, RDA, Prox-SVRG and Prox-Spider, usually generate merely dense solutions. To overcome this shortage, we propose a novel stochastic… 

Figures and Tables from this paper

CDFI: Compression-Driven Network Design for Frame Interpolation
TLDR
This work proposes a compression-driven network design for frame interpolation (CDFI), that leverages model pruning through sparsity-inducing optimization to significantly reduce the model size while achieving superior performance.

References

SHOWING 1-10 OF 35 REFERENCES
Orthant Based Proximal Stochastic Gradient Method for 𝓁1-Regularized Optimization
TLDR
On a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values, and the experiments on non-convex deep neural networks further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.
A stochastic extra-step quasi-Newton method for nonsmooth nonconvex optimization
TLDR
A novel stochastic extra-step quasi-Newton method is developed to solve a class of nonsmooth nonconvex composite optimization problems and it is shown that it compares favorably with other state-of-the-art methods.
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
TLDR
This work proposes a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+, which generalizes the best results given by the SCSG algorithm and achieves global linear convergence rate without restart.
MRI reconstruction via enhanced group sparsity and nonconvex regularization
Adaptive Methods for Nonconvex Optimization
TLDR
The result implies that increasing minibatch sizes enables convergence, thus providing a way to circumvent the non-convergence issues, and provides a new adaptive optimization algorithm, Yogi, which controls the increase in effective learning rate, leading to even better performance with similar theoretical guarantees on convergence.
FaRSA for ℓ1-regularized convex optimization: local convergence and numerical experience
TLDR
An enhanced subproblem termination condition is introduced that allows us to prove that the iterates converge locally at a superlinear rate and the details of the publicly available C implementation are presented along with extensive numerical comparisons to other state-of-the-art solvers.
Efficient Online and Batch Learning Using Forward Backward Splitting
TLDR
The two phase approach enables sparse solutions when used in conjunction with regularization functions that promote sparsity, such as l1, l2, l22, and l∞ regularization, and is extended and given efficient implementations for very high-dimensional data with sparsity.
Group-Sparse Signal Denoising: Non-Convex Regularization, Convex Optimization
TLDR
This paper utilizes a non-convex regularization term chosen such that the total cost function (consisting of data consistency and regularization terms) is convex, so that sparsity is more strongly promoted than in the standard convex formulation, but without sacrificing the attractive aspects of convex optimization.
Learning with dynamic group sparsity
TLDR
This paper has developed a new greedy sparse recovery algorithm, which prunes data residues in the iterative process according to both sparsity and group clustering priors rather than only sparsity as in previous methods.
“Active-set complexity” of proximal gradient: How long does it take to find the sparsity pattern?
TLDR
A bound is given on the active-set complexity of proximal gradient methods in the common case of minimizing the sum of a strongly-convex smooth function and a separable convex non-smooth function.
...
1
2
3
4
...