• Corpus ID: 316473

On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection

  title={On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection},
  author={Pratik Jawanpuria and Manik Varma and SakethaNath Jagarlapudi},
Our objective is to develop formulations and algorithms for efficiently computing the feature selection path - i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to lp≤1 regularization (lp-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art lp-MKL algorithms are too computationally expensive to be invoked thousands of times… 

Figures and Tables from this paper

Multiple Graph-Kernel Learning

A Multiple Kernel Learning (MKL) approach to learn different weights of different bunches of features which are grouped by complexity, and defines a notion of kernel complexity, namely Kernel Spectral Complexity, and shows how this complexity relates to the well-known Empirical Rademacher Complexity for a natural class of functions which include SVM.

Generalized hierarchical kernel learning

A generic regularizer enables the proposed formulation of Hierarchical Kernel Learning to be employed in the Rule Ensemble Learning (REL) where the goal is to construct an ensemble of conjunctive propositional rules.

A Sequential Learning Approach for Scaling Up Filter-Based Feature Subset Selection

The proposed framework uses multiarm bandit algorithms to sequentially search a subset of variables, and assign a level of importance for each feature, allowing it to naturally scale to large data sets, evaluate such data in a very small amount of time, and be performed independently of the optimization of any classifier to reduce unnecessary complexity.

Learning Proximity Relations for Feature Selection

A theoretical analysis of the generalization error of the proposed method is provided which validates the effectiveness of the method and demonstrates the success of the approach applying to feature selection.

A Geometric Viewpoint of the Selection of the Regularization Parameter in Some Support Vector Machines

This work proposes an algorithm that identifies neighbouring vertices of a given vertex and thereby identifies the classifiers corresponding to the set of Vertices of this polytope, and chooses a classifier based on a suitable test error criterion.

Learning Kernels for Multiple Predictive Tasks

This thesis presents a family of regularized risk minimization based convex formulations, of increasing generality, for learning features (kernels) in various settings involving multiple tasks and proposes a mixed-norm based formulation for learning the shared kernel as well as the prediction functions of all the tasks.

Soft Kernel Target Alignment for Two-Stage Multiple Kernel Learning

ALIGNF+, a soft version of ALIGNF, is proposed, based on the observation that the dual problem of ALignF is essentially a one-class SVM problem, and just requires an upper bound on the kernel weights of original AL IGNF.



l p -Norm Multiple Kernel Learning

Empirical applications of lp-norm MKL to three real-world problems from computational biology show that non-sparse MKL achieves accuracies that surpass the state-of-the-art, and two efficient interleaved optimization strategies for arbitrary norms are developed.

Multi Kernel Learning with Online-Batch Optimization

This work presents a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate and introduces a p-norm formulation of MKL that controls the level of sparsity of the solution, leading to an easier optimization problem.

More generality in efficient multiple kernel learning

It is observed that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization while retaining all the efficiency of existing large scale optimization algorithms.

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

The extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

Multi-label Multiple Kernel Learning

The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP) and an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem.

Ultra-Fast Optimization Algorithm for Sparse Multi Kernel Learning

This paper introduces a novel MKL formulation, which mixes elements of p-norm and elastic-net kind of regularization, and proposes a fast stochastic gradient descent method that solves the novelMKL formulation.

L2 Regularization for Learning Kernels

This paper presents a novel theoretical analysis of the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and gives learning bounds for orthogonal kernels that contain only an additive term O(√p/m) when compared to the standard kernel ridge regression stability bound.

From Lasso regression to Feature vector machine

A new approach named the Feature Vector Machine (FVM), which reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors.

SPF-GMKL: generalized multiple kernel learning with a million kernels

A Spectral Projected Gradient descent optimizer is developed which takes into account second order information in selecting step sizes, employs a non-monotone step size selection criterion requiring fewer function evaluations, is robust to gradient noise, and can take quick steps when far away from the optimum.

Scalable training of L1-regularized log-linear models

This work presents an algorithm Orthant-Wise Limited-memory Quasi-Newton (OWL-QN), based on L-BFGS, that can efficiently optimize the L1-regularized log-likelihood of log-linear models with millions of parameters.