• Corpus ID: 246285569

Towards Sharp Stochastic Zeroth Order Hessian Estimators over Riemannian Manifolds

@article{Wang2022TowardsSS,
  title={Towards Sharp Stochastic Zeroth Order Hessian Estimators over Riemannian Manifolds},
  author={Tianyu Wang},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.10780}
}
  • Tianyu Wang
  • Published 26 January 2022
  • Mathematics, Computer Science
  • ArXiv
We study Hessian estimators for real-valued functions defined over an n-dimensional complete Riemannian manifold. We introduce new stochastic zeroth-order Hessian estimators using O(1) function evaluations. We show that, for a smooth real-valued function f with Lipschitz Hessian (with respect to the Rimannian metric), our estimator achieves a bias bound of order O ( L2δ + γδ 2 ) , where L2 is the Lipschitz constant for the Hessian, γ depends on both the Levi-Civita connection and function f… 

Figures and Tables from this paper

Stochastic Zeroth Order Gradient and Hessian Estimators: Variance Reduction and Refined Bias Bounds
We study stochastic zeroth order gradient and Hessian estimators for real-valued functions in R n . We show that, via taking finite difference along random orthogonal directions, the variance of the

References

SHOWING 1-10 OF 23 REFERENCES
Stochastic Zeroth-order Riemannian Derivative Estimation and Optimization
TLDR
The proposed estimators overcome the difficulty of the non-linearity of the manifold constraint and the issues that arise in using Euclidean Gaussian smoothing techniques when the function is defined only over the manifold.
Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points
In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization, with a focus on addressing constrained optimization, high-dimensional
Random Gradient-Free Minimization of Convex Functions
TLDR
New complexity bounds for methods of convex optimization based only on computation of the function value are proved, which appears that such methods usually need at most n times more iterations than the standard gradient methods, where n is the dimension of the space of variables.
Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations
TLDR
Focusing on nonasymptotic bounds on convergence rates, it is shown that if pairs of function values are available, algorithms for d-dimensional optimization that use gradient estimates based on random perturbations suffer a factor of at most √d in convergence rate over traditional stochastic gradient methods.
Riemannian Geometry
THE recent physical interpretation of intrinsic differential geometry of spaces has stimulated the study of this subject. Riemann proposed the generalisation, to spaces of any order, of Gauss's
Cubic regularization of Newton method and its global performance
TLDR
This paper provides theoretical analysis for a cubic regularization of Newton method as applied to unconstrained minimization problem and proves general local convergence results for this scheme.
Adaptive stochastic approximation by the simultaneous perturbation method
  • J. Spall
  • Computer Science
    IEEE Trans. Autom. Control.
  • 2000
TLDR
This paper presents a general adaptive SA algorithm that is based on an easy method for estimating the Hessian matrix at each iteration while concurrently estimating the primary parameters of interest.
Second-Order Stochastic Optimization for Machine Learning in Linear Time
TLDR
This paper develops second-order stochastic methods for optimization problems in machine learning that match the per-iteration cost of gradient based methods, and in certain settings improve upon the overall running time over popular first-order methods.
A Simplex Method for Function Minimization
A method is described for the minimization of a function of n variables, which depends on the comparison of function values at the (n 41) vertices of a general simplex, followed by the replacement of
Online convex optimization in the bandit setting: gradient descent without a gradient
TLDR
It is possible to use gradient descent without seeing anything more than the value of the functions at a single point, and the guarantees hold even in the most general case: online against an adaptive adversary.
...
...