# A Family of Probabilistic Kernels Based on Information Divergence

@inproceedings{Chan2004AFO, title={A Family of Probabilistic Kernels Based on Information Divergence}, author={Antoni B. Chan and Nuno Vasconcelos and Pedro J. Moreno}, year={2004} }

Probabilistic kernels offer a way to combine generative models with discriminative classifiers. We establish connections between probabilistic kernels and feature space kernels through a geometric interpretation of the previously proposed probability product kernel. A family of probabilistic kernels, based on information divergence measures, is then introduced and its connections to various existing probabilistic kernels are analyzed. The new family is shown to provide a unifying framework for… Expand

#### 60 Citations

Deriving Probabilistic SVM Kernels from Exponential Family Approximations to Multivariate Distributions for Count Data

- Computer Science
- Unsupervised and Semi-Supervised Learning
- 2019

A robust hybrid probabilistic learning approach is proposed that combines appropriately the advantages of both the generative and discriminative models for modeling count data and demonstrates the flexibility and the merits of the proposed frameworks for the problem of analyzing activities in surveillance scenes. Expand

Deriving Probabilistic SVM Kernels From Flexible Statistical Mixture Models and its Application to Retinal Images Classification

- Computer Science
- IEEE Access
- 2019

The developed hybrid model is introduced in this paper as an effective SVM kernel able to incorporate prior knowledge about the nature of data involved in the problem at hand and, therefore, permits a good data discrimination. Expand

Bayesian hybrid generative discriminative learning based on finite Liouville mixture models

- Mathematics, Computer Science
- Pattern Recognit.
- 2011

The true structure of non-Gaussian and especially proportional vectors data is discovered by building probabilistic kernels from generative mixture models based on Liouville family, from which the Beta-Liouville distribution is developed, and which includes the well-known Dirichlet as a special case. Expand

Sequence Classification in the Jensen-Shannon Embedding

- 2006

This paper presents a novel approach to the supervised classification of structured objects such as sequences, trees and graphs, when the input instances are characterized by probability… Expand

Tensor-Based Gaussian Processes Regression Using a Probabilistic Kernel with Information Divergence

- 2016

We present a Gaussian processes regression for tensor-valued inputs, which is based on a coherent treatment of a prior for the latent function with a covariance function defined on a tensor… Expand

A Note on Gradient Based Learning in Vector Quantization Using Differentiable Kernels for Hilbert and

- 2012

Supervised and unsupervised prototype based vector quantization frequently are proceeded in the Euclidean space. In the last years, also non-standard metrics became popular. For classification by… Expand

A Finite Gamma Mixture Model-Based Discriminative Learning Frameworks

- Computer Science
- 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)
- 2015

A hybrid generative discriminative framework based on support vector machine and Gamma mixture is developed, which focuses on the generation of kernels when examples are structured data modeled by Gamma mixtures. Expand

Deriving kernels from generalized Dirichlet mixture models and applications

- Mathematics, Computer Science
- Inf. Process. Manag.
- 2013

A class of generative kernels based on finite mixture models for non-Gaussian data classification based on the generalized Dirichlet distribution which have been shown to be effective to model this kind of data. Expand

A Tensor-Variate Gaussian Process for Classification of Multidimensional Structured Data

- Computer Science
- AAAI
- 2013

Simulation results demonstrate the effectiveness and advantages of the proposed approach for classification of multiway tensor data, especially in the case that the underlying structure information among multimodes is discriminative for the classification task. Expand

Beyond hybrid generative discriminative learning: spherical data classification

- Computer Science
- Pattern Analysis and Applications
- 2013

This paper investigates a generative mixture model to cluster spherical data based on Langevin distribution and formulates a unified probabilistic framework, where it is demonstrated the effectiveness and the merits of the proposed learning framework through synthetic data and challenging applications involving spam filtering using both textual and visual email contents. Expand

#### References

SHOWING 1-10 OF 17 REFERENCES

Bhattacharyya and Expected Likelihood Kernels

- 2003

We introduce a new class of kernels between distributions. These induce a kernel on the input space between data points by associating to each datum a generative model fit to the data point… Expand

A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

- Computer Science, Mathematics
- NIPS
- 2003

This paper suggests an alternative procedure to the Fisher kernel for systematically finding kernel functions that naturally handle variable length sequence data in multimedia domains and derives a kernel distance based on the Kullback-Leibler (KL) divergence between generative models. Expand

A New Discriminative Kernel from Probabilistic Models

- Medicine, Computer Science
- Neural Computation
- 2002

This work proposes a new discriminative TOP kernel derived from tangent vectors of posterior log-odds and develops a theoretical framework on feature extractors from probabilistic models and uses it for analyzing the TOP kernel. Expand

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

- Mathematics, Computer Science
- ECCV
- 2004

A taxonomy of kernels based on the combination of the KL-kernel with various probabilistic representation previously proposed in the recognition literature is derived, which shows that these kernels can significantly outperform traditional SVM solutions for recognition. Expand

Classes of Kernels for Machine Learning: A Statistics Perspective

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2001

The spectral representation of the various classes of kernels is described and a discussion on the characterization of nonlinear maps that reduce nonstationary kernels to either stationarity or local stationarity is discussed. Expand

Feature Space Interpretation of SVMs with non Positive Definite Kernels Internal Report 1 / 03

- 2003

The widespread habit of “plugging” arbitrary symmetric functions as kernels in support vector machines (SVMs) often yields good empirical classification results. However, in case of non conditionally… Expand

Exploiting Generative Models in Discriminative Classifiers

- Mathematics, Computer Science
- NIPS
- 1998

A natural way of achieving this combination by deriving kernel functions for use in discriminative methods such as support vector machines from generative probability models is developed. Expand

A comparison of methods for multiclass support vector machines

- Computer Science, Medicine
- IEEE Trans. Neural Networks
- 2002

Decomposition implementations for two "all-together" multiclass SVM methods are given and it is shown that for large problems methods by considering all data at once in general need fewer support vectors. Expand

Divergence measures based on the Shannon entropy

- Mathematics, Computer Science
- IEEE Trans. Inf. Theory
- 1991

A novel class of information-theoretic divergence measures based on the Shannon entropy is introduced, which do not require the condition of absolute continuity to be satisfied by the probability distributions involved and are established in terms of bounds. Expand

Online handwriting recognition with support vector machines - a kernel approach

- Computer Science
- Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition
- 2002

A novel classification approach for online handwriting recognition is described that combines dynamic time warping (DTW) and support vector machines (SVMs) by establishing a new SVM kernel that is directly addresses the problem of discrimination by creating class boundaries and thus is less sensitive to modeling assumptions. Expand