# A representer theorem for deep kernel learning

@article{Bohn2019ART, title={A representer theorem for deep kernel learning}, author={Bastian Bohn and Michael Griebel and Christian Rieger}, journal={ArXiv}, year={2019}, volume={abs/1709.10441} }

In this paper we provide a representer theorem for the concatenation of (linear combinations of) kernel functions of reproducing kernel Hilbert spaces. This result serves as mathematical foundation for the analysis of machine learning algorithms based on compositions of functions. As a direct consequence, the corresponding infinite-dimensional minimization problems can be recast into (nonlinear) finite-dimensional minimization problems, which can be tackled with nonlinear optimization…

## 27 Citations

### 2 What is an RKHS ?

- Mathematics, Computer Science
- 2019

The representer theorem for various learning problems under the reproducing kernel Hilbert spaces framework is reviewed, with solutions to the penalized least squares and penalized likelihood for nonparametric regression, and support vector machines for classification as a solution to the Penalized hinge loss.

### Universality and Optimality of Structured Deep Kernel Networks

- Computer ScienceArXiv
- 2021

A recent deep kernel representer theorem is leverage to connect the two approaches and understand their interplay, showing that the use of special types of kernels yield models reminiscent of neural networks that are founded in the same theoretical framework of classical kernel methods, while enjoying many computational properties of deep neural networks.

### An Efficient Empirical Solver for Localized Multiple Kernel Learning via DNNs

- Computer Science2020 25th International Conference on Pattern Recognition (ICPR)
- 2021

This paper proposes parameterizing the gating function for learning kernel combination weights and the multiclass classifier using an attentional network (AN) and a multilayer perceptron (MLP), respectively, to help understand how the network solves the problem.

### What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory

- Computer ScienceSIAM J. Math. Data Sci.
- 2022

A new function space, which is reminiscent of classical bounded variation spaces, that captures the compositional structure associated with deep neural networks is proposed, and a representer theorem is derived showing that deep ReLU networks are solutions to regularized data fitting problems in this function space.

### A Unifying Representer Theorem for Inverse Problems and Machine Learning

- MathematicsFound. Comput. Math.
- 2021

A general representer theorem is presented that characterizes the solutions of a remarkably broad class of optimization problems and is used to retrieve a number of known results in the literature---e.g., the celebrated representser theorem of machine leaning for RKHS, Tikhonov regularization, representer theorems for sparsity promoting functionals, the recovery of spikes.

### LMKL-Net: A Fast Localized Multiple Kernel Learning Solver via Deep Neural Networks

- Computer ScienceArXiv
- 2018

Overall LMKL-Net can not only outperform the state-of-the-art MKL solvers in terms of accuracy, but also be trained about two orders of magnitude faster with much smaller memory footprint for large-scale learning.

### Learning rates for the kernel regularized regression with a differentiable strongly convex loss

- Computer ScienceCommunications on Pure & Applied Analysis
- 2020

The learning rates when the hypothesis RKHS's logarithmic complexity exponent is arbitrarily small as well as sufficiently large are provided and the robustness with the maximum mean discrepancy and the Hutchinson metric is shown.

### A kernel‐expanded stochastic neural network

- Computer ScienceArXiv
- 2022

The so‐called kernel‐expanded stochastic neural network (K‐StoNet) model is proposed, which incorporates support vector regression as the first hidden layer and reformulates the neural network as a latent variable model, and possesses a theoretical guarantee to asymptotically converge to the global optimum.

### A Kernel Perspective for the Decision Boundary of Deep Neural Networks

- Computer Science2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)
- 2020

It is argued that the multi-layer nonlinear feature transformation in deep neural networks is equivalent to a kernel feature mapping and analyzed from the perspective of the unique mathematical advantages of kernel methods and the method of constructing multi- layer kernel machines.

### Deep Spectral Kernel Learning

- Computer ScienceIJCAI
- 2019

A novel deep spectral kernel network (DSKN) is proposed to naturally integrate nonstationary and non-monotonic spectral kernels into elegant deep architectures in an interpretable way, which can be further generalized to cover most kernels.

## References

SHOWING 1-10 OF 31 REFERENCES

### Kernel Methods for Deep Learning

- Computer ScienceNIPS
- 2009

A new family of positive-definite kernel functions that mimic the computation in large, multilayer neural nets are introduced that can be used in shallow architectures, such as support vector machines (SVMs), or in deep kernel-based architectures that the authors call multilayers kernel machines (MKMs).

### Deep Kernel Learning

- Computer ScienceAISTATS
- 2016

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs…

### On Learning Vector-Valued Functions

- Mathematics, Computer ScienceNeural Computation
- 2005

This letter provides a study of learning in a Hilbert space of vector-valued functions and derives the form of the minimal norm interpolant to a finite set of data and applies it to study some regularization functionals that are important in learning theory.

### Deep Multiple Kernel Learning

- Computer Science2013 12th International Conference on Machine Learning and Applications
- 2013

This paper combines kernels at each layer and then optimize over an estimate of the support vector machine leave-one-out error rather than the dual objective function to improve performance on a variety of datasets.

### Multiple kernel learning, conic duality, and the SMO algorithm

- Computer ScienceICML
- 2004

Experimental results are presented that show that the proposed novel dual formulation of the QCQP as a second-order cone programming problem is significantly more efficient than the general-purpose interior point methods available in current optimization toolboxes.

### Two-Layer Multiple Kernel Learning

- Computer ScienceAISTATS
- 2011

This paper investigates a framework of Multi-Layer Multiple Kernel Learning that aims to learn “deep” kernel machines by exploring the combinations of multiple kernels in a multi-layer structure, which goes beyond the conventional MKL approach.

### Deep multilayer multiple kernel learning

- Computer ScienceNeural Computing and Applications
- 2015

This paper proposes to optimize the network over an adaptive backpropagation MLMKL framework using the gradient ascent method instead of dual objective function, or the estimation of the leave-one-out error, and achieves high performance.

### Explaining nonlinear classification decisions with deep Taylor decomposition

- Computer SciencePattern Recognit.
- 2017

### Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

- Computer ScienceIEEE Transactions on Neural Networks
- 2005

This book is an excellent choice for readers who wish to familiarize themselves with computational intelligence techniques or for an overview/introductory course in the field of computational intelligence.

### Approximation capabilities of multilayer feedforward networks

- Computer ScienceNeural Networks
- 1991