The Representation Jensen-R\'enyi Divergence
@inproceedings{Osorio2021TheRJ, title={The Representation Jensen-R\'enyi Divergence}, author={Jhoan Keider Hoyos Osorio and Oscar Skean and Austin J. Brockmeier and Luis Gonzalo S{\'a}nchez Giraldo}, year={2021} }
We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by kernels. The empirical estimator of the divergence is computed using the eigenvalues of positive definite Gram matrices that are obtained by evaluating the kernel over pairs of data points. The new measure shares similar properties to Jensen-Shannon divergence. Convergence of the proposed estimators follows from concentration results based on the difference between the…
References
SHOWING 1-10 OF 24 REFERENCES
Measures of Entropy From Data Using Infinitely Divisible Kernels
- Computer ScienceIEEE Transactions on Information Theory
- 2015
A framework to nonparametrically obtain measures of entropy directly from data using operators in reproducing kernel Hilbert spaces defined by infinitely divisible kernels is presented and estimators of kernel-based conditional entropy and mutual information are also defined.
Optimal Transport in Reproducing Kernel Hilbert Spaces: Theory and Applications
- Computer ScienceIEEE Transactions on Pattern Analysis and Machine Intelligence
- 2020
The case in which data distributions in RKHS are Gaussian is explored, obtaining closed-form expressions of both the estimated Wasserstein distance and optimal transport map via kernel matrices, and the Bures metric on covariance matrices is generalized to infinite-dimensional settings, providing a new metric between covariance operators.
Metrics induced by Jensen-Shannon and related divergences on positive definite matrices
- Computer ScienceLinear Algebra and its Applications
- 2019
Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions
- Computer ScienceNIPS
- 2009
It is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification, and a generalization of the MMD is proposed for families of kernels.
A metric approach toward point process divergence
- Computer Science2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2011
This work addresses the problem of estimating Jensen-Shannon divergence in a metric space using a nearest neighbor based approach, and empirically demonstrate the validity of the proposed estimator, and compares it against other available methods in the context of two-sample problem.
Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives
- Computer ScienceInformation Theoretic Learning
- 2010
Students, practitioners and researchers interested in statistical signal processing, computational intelligence, and machine learning will find the theory to understand the basics, the algorithms to implement applications, and exciting but still unexplored leads that will provide fertile ground for future research in this book.
A Kernel Two-Sample Test
- Mathematics, Computer ScienceJ. Mach. Learn. Res.
- 2012
This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).
A new metric for probability distributions
- Computer ScienceIEEE Transactions on Information Theory
- 2003
We introduce a metric for probability distributions, which is bounded, information-theoretically motivated, and has a natural Bayesian interpretation. The square root of the well-known /spl chi//sup…
Random Features for Large-Scale Kernel Machines
- Computer ScienceNIPS
- 2007
Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.
f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization
- Computer ScienceNIPS
- 2016
It is shown that any f-divergence can be used for training generative neural samplers and the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models are discussed.