# The Representation Jensen-R\'enyi Divergence

@inproceedings{Osorio2021TheRJ, title={The Representation Jensen-R\'enyi Divergence}, author={Jhoan Keider Hoyos Osorio and Oscar Skean and Austin J. Brockmeier and Luis Gonzalo S{\'a}nchez Giraldo}, year={2021} }

We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces deﬁned by kernels. The empirical estimator of the divergence is computed using the eigenvalues of positive deﬁnite Gram matrices that are obtained by evaluating the kernel over pairs of data points. The new measure shares similar properties to Jensen-Shannon divergence. Convergence of the proposed estimators follows from concentration results based on the diﬀerence between the…

## References

SHOWING 1-10 OF 24 REFERENCES

### Measures of Entropy From Data Using Infinitely Divisible Kernels

- Computer ScienceIEEE Transactions on Information Theory
- 2015

A framework to nonparametrically obtain measures of entropy directly from data using operators in reproducing kernel Hilbert spaces defined by infinitely divisible kernels is presented and estimators of kernel-based conditional entropy and mutual information are also defined.

### Optimal Transport in Reproducing Kernel Hilbert Spaces: Theory and Applications

- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2020

The case in which data distributions in RKHS are Gaussian is explored, obtaining closed-form expressions of both the estimated Wasserstein distance and optimal transport map via kernel matrices, and the Bures metric on covariance matrices is generalized to infinite-dimensional settings, providing a new metric between covariance operators.

### Metrics induced by Jensen-Shannon and related divergences on positive definite matrices

- Computer ScienceLinear Algebra and its Applications
- 2019

### Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

- Computer ScienceNIPS
- 2009

It is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification, and a generalization of the MMD is proposed for families of kernels.

### A metric approach toward point process divergence

- Computer Science2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2011

This work addresses the problem of estimating Jensen-Shannon divergence in a metric space using a nearest neighbor based approach, and empirically demonstrate the validity of the proposed estimator, and compares it against other available methods in the context of two-sample problem.

### Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives

- Computer ScienceInformation Theoretic Learning
- 2010

Students, practitioners and researchers interested in statistical signal processing, computational intelligence, and machine learning will find the theory to understand the basics, the algorithms to implement applications, and exciting but still unexplored leads that will provide fertile ground for future research in this book.

### A Kernel Two-Sample Test

- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2012

This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).

### A new metric for probability distributions

- Computer ScienceIEEE Transactions on Information Theory
- 2003

We introduce a metric for probability distributions, which is bounded, information-theoretically motivated, and has a natural Bayesian interpretation. The square root of the well-known /spl chi//sup…

### Random Features for Large-Scale Kernel Machines

- Computer ScienceNIPS
- 2007

Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.

### f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

- Computer ScienceNIPS
- 2016

It is shown that any f-divergence can be used for training generative neural samplers and the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models are discussed.