• Corpus ID: 245353945

The Representation Jensen-R\'enyi Divergence

@inproceedings{Osorio2021TheRJ,
  title={The Representation Jensen-R\'enyi Divergence},
  author={Jhoan Keider Hoyos Osorio and Oscar Skean and Austin J. Brockmeier and Luis Gonzalo S{\'a}nchez Giraldo},
  year={2021}
}
We introduce a divergence measure between data distributions based on operators in reproducing kernel Hilbert spaces defined by kernels. The empirical estimator of the divergence is computed using the eigenvalues of positive definite Gram matrices that are obtained by evaluating the kernel over pairs of data points. The new measure shares similar properties to Jensen-Shannon divergence. Convergence of the proposed estimators follows from concentration results based on the difference between the… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 24 REFERENCES

Measures of Entropy From Data Using Infinitely Divisible Kernels

A framework to nonparametrically obtain measures of entropy directly from data using operators in reproducing kernel Hilbert spaces defined by infinitely divisible kernels is presented and estimators of kernel-based conditional entropy and mutual information are also defined.

Optimal Transport in Reproducing Kernel Hilbert Spaces: Theory and Applications

The case in which data distributions in RKHS are Gaussian is explored, obtaining closed-form expressions of both the estimated Wasserstein distance and optimal transport map via kernel matrices, and the Bures metric on covariance matrices is generalized to infinite-dimensional settings, providing a new metric between covariance operators.

Metrics induced by Jensen-Shannon and related divergences on positive definite matrices

  • S. Sra
  • Computer Science
    Linear Algebra and its Applications
  • 2019

Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions

It is established that MMD corresponds to the optimal risk of a kernel classifier, thus forming a natural link between the distance between distributions and their ease of classification, and a generalization of the MMD is proposed for families of kernels.

A metric approach toward point process divergence

This work addresses the problem of estimating Jensen-Shannon divergence in a metric space using a nearest neighbor based approach, and empirically demonstrate the validity of the proposed estimator, and compares it against other available methods in the context of two-sample problem.

Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives

  • J. Príncipe
  • Computer Science
    Information Theoretic Learning
  • 2010
Students, practitioners and researchers interested in statistical signal processing, computational intelligence, and machine learning will find the theory to understand the basics, the algorithms to implement applications, and exciting but still unexplored leads that will provide fertile ground for future research in this book.

A Kernel Two-Sample Test

This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).

A new metric for probability distributions

We introduce a metric for probability distributions, which is bounded, information-theoretically motivated, and has a natural Bayesian interpretation. The square root of the well-known /spl chi//sup

Random Features for Large-Scale Kernel Machines

Two sets of random features are explored, provided convergence bounds on their ability to approximate various radial basis kernels, and it is shown that in large-scale classification and regression tasks linear machine learning algorithms applied to these features outperform state-of-the-art large- scale kernel machines.

f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization

It is shown that any f-divergence can be used for training generative neural samplers and the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models are discussed.