• Corpus ID: 250607413

Sparse solutions of the kernel herding algorithm by improved gradient approximation

@inproceedings{Tsuji2021SparseSO,
  title={Sparse solutions of the kernel herding algorithm by improved gradient approximation},
  author={Kazuma Tsuji and Ken’ichiro Tanaka},
  year={2021}
}
The kernel herding algorithm is used to construct quadrature rules in a reproducing kernel Hilbert space (RKHS). While the computational efficiency of the algorithm and stability of the output quadrature formulas are advantages of this method, the convergence speed of the integration error for a given number of nodes is slow compared to that of other quadrature methods. In this paper, we propose a modified kernel herding algorithm whose framework was introduced in a previous study and aim to… 

Figures from this paper

References

SHOWING 1-10 OF 38 REFERENCES

On the Equivalence between Herding and Conditional Gradient Algorithms

The experiments indicate that while the herding procedure of Welling (2009) can improve over herding on the task of approximating integrals, the original herding algorithm tends to approach more often the maximum entropy distribution, shedding more light on the learning bias behind herding.

Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering

Experiments indicate an improvement of accuracy over random and quasi-Monte Carlo sampling on a robot localization task and the additional computational cost to generate the particles through optimization can be justified.

Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings

This paper reveals a condition on design points to make Bayesian quadrature robust to misspecification, and shows that it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order under a slightly stronger regularity condition on the integrand.

Boosting Frank-Wolfe by Chasing Gradients

This paper proposes to speed up the Frank-Wolfe algorithm by better aligning the descent direction with that of the negative gradient via a subroutine, and derives convergence rates to $\mathcal{O}(1/t)$ to $e^{-\omega t})$ of the method.

Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy

It is shown that the finite-sample-size approximation error, measured by the MMD, decreases as 1 /n for SBQ and also for kernel herding and greedy MMD minimisation when using a suitable step-size sequence.

On the Global Linear Convergence of Frank-Wolfe Optimization Variants

This paper highlights and clarify several variants of the Frank-Wolfe optimization algorithm that have been successfully applied in practice: away-steps FW, pairwise FW, fully-corrective FW and Wolfe's minimum norm point algorithm, and proves for the first time that they all enjoy global linear convergence, under a weaker condition than strong convexity of the objective.

Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees

This paper considers the intermediate case of optimization over the convex cone, parametrized as the conic hull of a generic atom set, leading to the first principled definitions of non-negative MP algorithms, which give explicit convergence rates and demonstrate excellent empirical performance.

Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization

A new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, is presented, and the broad application areas of this approach are discussed.

Positively Weighted Kernel Quadrature via Subsampling

This approach combines the spectral properties of the kernel with recombination results about point measures and results in effective algorithms that construct convex quadrature rules using only access to i.i.d. samples.

Super-Samples from Kernel Herding

The herding algorithm is extended to continuous spaces by using the kernel trick and it is shown that kernel herding decreases the error of expectations of functions in the Hilbert space at a rate O(1/T) which is much faster than the usual O( 1/pT) for iid random samples.