# On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection

@inproceedings{Jawanpuria2014OnPP, title={On p-norm Path Following in Multiple Kernel Learning for Non-linear Feature Selection}, author={Pratik Jawanpuria and M. Varma and SakethaNath Jagarlapudi}, booktitle={ICML}, year={2014} }

Our objective is to develop formulations and algorithms for efficiently computing the feature selection path - i.e. the variation in classification accuracy as the fraction of selected features is varied from null to unity. Multiple Kernel Learning subject to lp≤1 regularization (lp-MKL) has been demonstrated to be one of the most effective techniques for non-linear feature selection. However, state-of-the-art lp-MKL algorithms are too computationally expensive to be invoked thousands of times… Expand

#### 28 Citations

Exploiting the structure of feature spaces in kernel learning

- Mathematics
- 2016

The problem of learning the optimal representation for a specific task recently became an important and not trivial topic in the machine learning community.
In this field, deep architectures are the… Expand

Multiple Graph-Kernel Learning

- Computer Science
- 2015 IEEE Symposium Series on Computational Intelligence
- 2015

A Multiple Kernel Learning (MKL) approach to learn different weights of different bunches of features which are grouped by complexity, and defines a notion of kernel complexity, namely Kernel Spectral Complexity, and shows how this complexity relates to the well-known Empirical Rademacher Complexity for a natural class of functions which include SVM. Expand

ℓ2, 1 Norm Regularized Multi-kernel Based Joint Nonlinear Feature Selection and Over-sampling for Imbalanced Data Classification

- Mathematics, Computer Science
- Neurocomputing
- 2017

The experimental results demonstrate that jointly operating nonlinear feature selection and oversampling with 2,1 norm multi-kernel learning framework (2,1 MKFSOS) can lead to a promising classification performance. Expand

Generalized hierarchical kernel learning

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2015

A generic regularizer enables the proposed formulation of Hierarchical Kernel Learning to be employed in the Rule Ensemble Learning (REL) where the goal is to construct an ensemble of conjunctive propositional rules. Expand

A Sequential Learning Approach for Scaling Up Filter-Based Feature Subset Selection

- Computer Science, Medicine
- IEEE Transactions on Neural Networks and Learning Systems
- 2018

The proposed framework uses multiarm bandit algorithms to sequentially search a subset of variables, and assign a level of importance for each feature, allowing it to naturally scale to large data sets, evaluate such data in a very small amount of time, and be performed independently of the optimization of any classifier to reduce unnecessary complexity. Expand

Incorporating Distribution Matching into Uncertainty for Multiple Kernel Active Learning

- Computer Science
- IEEE Transactions on Knowledge and Data Engineering
- 2021

A multiple kernel active learning framework that incorporates a group regularizer of distribution information into the estimation of uncertainty and takes the advantage of multiple kernel learning to learn the kernel space in which the complex structures can be well captured by kernel weights is proposed. Expand

Learning Proximity Relations for Feature Selection

- Computer Science
- IEEE Transactions on Knowledge and Data Engineering
- 2016

A theoretical analysis of the generalization error of the proposed method is provided which validates the effectiveness of the method and demonstrates the success of the approach applying to feature selection. Expand

A Geometric Viewpoint of the Selection of the Regularization Parameter in Some Support Vector Machines

- Computer Science
- MIKE
- 2015

This work proposes an algorithm that identifies neighbouring vertices of a given vertex and thereby identifies the classifiers corresponding to the set of Vertices of this polytope, and chooses a classifier based on a suitable test error criterion. Expand

Learning Kernels for Multiple Predictive Tasks

- Computer Science
- 2014

This thesis presents a family of regularized risk minimization based convex formulations, of increasing generality, for learning features (kernels) in various settings involving multiple tasks and proposes a mixed-norm based formulation for learning the shared kernel as well as the prediction functions of all the tasks. Expand

Soft Kernel Target Alignment for Two-Stage Multiple Kernel Learning

- Computer Science
- DS
- 2016

ALIGNF+, a soft version of ALIGNF, is proposed, based on the observation that the dual problem of ALignF is essentially a one-class SVM problem, and just requires an upper bound on the kernel weights of original AL IGNF. Expand

#### References

SHOWING 1-10 OF 53 REFERENCES

l p -Norm Multiple Kernel Learning

- Mathematics
- 2011

Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel… Expand

Multi Kernel Learning with Online-Batch Optimization

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2012

This work presents a MKL optimization algorithm based on stochastic gradient descent that has a guaranteed convergence rate and introduces a p-norm formulation of MKL that controls the level of sparsity of the solution, leading to an easier optimization problem. Expand

More generality in efficient multiple kernel learning

- Mathematics, Computer Science
- ICML '09
- 2009

It is observed that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization while retaining all the efficiency of existing large scale optimization algorithms. Expand

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

- Computer Science, Mathematics
- NIPS
- 2008

The extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance. Expand

Multi-label Multiple Kernel Learning

- Computer Science, Mathematics
- NIPS
- 2008

The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP) and an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. Expand

Ultra-Fast Optimization Algorithm for Sparse Multi Kernel Learning

- Mathematics, Computer Science
- ICML
- 2011

This paper introduces a novel MKL formulation, which mixes elements of p-norm and elastic-net kind of regularization, and proposes a fast stochastic gradient descent method that solves the novelMKL formulation. Expand

L2 Regularization for Learning Kernels

- Computer Science, Mathematics
- UAI
- 2009

This paper presents a novel theoretical analysis of the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and gives learning bounds for orthogonal kernels that contain only an additive term O(√p/m) when compared to the standard kernel ridge regression stability bound. Expand

From Lasso regression to Feature vector machine

- Computer Science, Mathematics
- NIPS
- 2005

A new approach named the Feature Vector Machine (FVM), which reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with non-linear models by introducing kernels defined on feature vectors. Expand

SPF-GMKL: generalized multiple kernel learning with a million kernels

- Mathematics, Computer Science
- KDD
- 2012

A Spectral Projected Gradient descent optimizer is developed which takes into account second order information in selecting step sizes, employs a non-monotone step size selection criterion requiring fewer function evaluations, is robust to gradient noise, and can take quick steps when far away from the optimum. Expand

Scalable training of L1-regularized log-linear models

- Mathematics, Computer Science
- ICML '07
- 2007

This work presents an algorithm Orthant-Wise Limited-memory Quasi-Newton (OWL-QN), based on L-BFGS, that can efficiently optimize the L1-regularized log-likelihood of log-linear models with millions of parameters. Expand