# Learning Kernels for Structured Prediction using Polynomial Kernel Transformations

@article{Tonde2016LearningKF, title={Learning Kernels for Structured Prediction using Polynomial Kernel Transformations}, author={Chetan Tonde and A. Elgammal}, journal={ArXiv}, year={2016}, volume={abs/1601.01411} }

Learning the kernel functions used in kernel methods has been a vastly explored area in machine learning. It is now widely accepted that to obtain 'good' performance, learning a kernel function is the key challenge. In this work we focus on learning kernel representations for structured regression. We propose use of polynomials expansion of kernels, referred to as Schoenberg transforms and Gegenbaur transforms, which arise from the seminal result of Schoenberg (1938). These kernels can be…

## One Citation

### A probability distribution kernel based on whitening transformation

- Computer Science
- 2017

Experiments show that the DPWT kernel exhibits superior performances compared to the other state of the art kernels, and can effectively eliminate the correlation between vectors, and reduce the redundancy of data which can further improve the accuracy of classification.

## References

SHOWING 1-10 OF 31 REFERENCES

### Learning Translation Invariant Kernels for Classification

- Computer ScienceJ. Mach. Learn. Res.
- 2010

This paper considers the problem of optimizing a kernel function over the class of translation invariant kernels for the task of binary classification and proposes a formulation of a QCQP sub-problem which does not require the kernel matrices to be loaded into memory, making the method applicable to large-scale problems.

### Algorithms for Learning Kernels Based on Centered Alignment

- Computer ScienceJ. Mach. Learn. Res.
- 2012

These algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression.

### On numerical optimization theory of infinite kernel learning

- Computer ScienceJ. Glob. Optim.
- 2010

The results show that the new algorithm called “infinite” kernel learning (IKL) on heterogenous data sets improves the classifaction accuracy efficiently on heterogeneous data compared to classical one-kernel approaches.

### Learning Convex Combinations of Continuously Parameterized Basic Kernels

- Computer ScienceCOLT
- 2005

There always exists an optimal kernel which is the convex combination of at most m + 1 basic kernels, where m is the sample size, and provide a necessary and sufficient condition for a kernel to be optimal.

### Structured output-associative regression

- Computer Science2009 IEEE Conference on Computer Vision and Pattern Recognition
- 2009

This work proposes a new structured learning method-Structured Output-Associative Regression (SOAR) that models not only the input-dependency but also the self- dependency of outputs, in order to provide an output re-correlation mechanism that complements the (more standard) input-based regressive prediction.

### A DC-programming algorithm for kernel selection

- Computer ScienceICML
- 2006

This work builds upon a formulation involving a minimax optimization problem and a recently proposed greedy algorithm for learning the kernel to create a new algorithm which outperforms a previously proposed method.

### Support vector machine learning for interdependent and structured output spaces

- Computer ScienceICML
- 2004

This paper proposes to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs, and demonstrates the versatility and effectiveness of the method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.

### Max-Margin Markov Networks

- Computer ScienceNIPS
- 2003

Maximum margin Markov (M3) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data, and a new theoretical bound for generalization in structured domains is provided.

### Learning the Kernel Function via Regularization

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2005

It is shown that, although K may be an uncountable set, the optimal kernel is always obtained as a convex combination of at most m+2 basic kernels, where m is the number of data examples.

### Multiple Kernel Learning Algorithms

- Computer ScienceJ. Mach. Learn. Res.
- 2011

Overall, using multiple kernels instead of a single one is useful and it is believed that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.