• Corpus ID: 6341365

Learning Kernels for Structured Prediction using Polynomial Kernel Transformations

  title={Learning Kernels for Structured Prediction using Polynomial Kernel Transformations},
  author={Chetan Tonde and A. Elgammal},
Learning the kernel functions used in kernel methods has been a vastly explored area in machine learning. It is now widely accepted that to obtain 'good' performance, learning a kernel function is the key challenge. In this work we focus on learning kernel representations for structured regression. We propose use of polynomials expansion of kernels, referred to as Schoenberg transforms and Gegenbaur transforms, which arise from the seminal result of Schoenberg (1938). These kernels can be… 

A probability distribution kernel based on whitening transformation

Experiments show that the DPWT kernel exhibits superior performances compared to the other state of the art kernels, and can effectively eliminate the correlation between vectors, and reduce the redundancy of data which can further improve the accuracy of classification.



Learning Translation Invariant Kernels for Classification

This paper considers the problem of optimizing a kernel function over the class of translation invariant kernels for the task of binary classification and proposes a formulation of a QCQP sub-problem which does not require the kernel matrices to be loaded into memory, making the method applicable to large-scale problems.

Algorithms for Learning Kernels Based on Centered Alignment

These algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression.

On numerical optimization theory of infinite kernel learning

The results show that the new algorithm called “infinite” kernel learning (IKL) on heterogenous data sets improves the classifaction accuracy efficiently on heterogeneous data compared to classical one-kernel approaches.

Learning Convex Combinations of Continuously Parameterized Basic Kernels

There always exists an optimal kernel which is the convex combination of at most m + 1 basic kernels, where m is the sample size, and provide a necessary and sufficient condition for a kernel to be optimal.

Structured output-associative regression

This work proposes a new structured learning method-Structured Output-Associative Regression (SOAR) that models not only the input-dependency but also the self- dependency of outputs, in order to provide an output re-correlation mechanism that complements the (more standard) input-based regressive prediction.

A DC-programming algorithm for kernel selection

This work builds upon a formulation involving a minimax optimization problem and a recently proposed greedy algorithm for learning the kernel to create a new algorithm which outperforms a previously proposed method.

Support vector machine learning for interdependent and structured output spaces

This paper proposes to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs, and demonstrates the versatility and effectiveness of the method on problems ranging from supervised grammar learning and named-entity recognition, to taxonomic text classification and sequence alignment.

Max-Margin Markov Networks

Maximum margin Markov (M3) networks incorporate both kernels, which efficiently deal with high-dimensional features, and the ability to capture correlations in structured data, and a new theoretical bound for generalization in structured domains is provided.

Learning the Kernel Function via Regularization

It is shown that, although K may be an uncountable set, the optimal kernel is always obtained as a convex combination of at most m+2 basic kernels, where m is the number of data examples.

Multiple Kernel Learning Algorithms

Overall, using multiple kernels instead of a single one is useful and it is believed that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.