# Double Sparsity Kernel Learning with Automatic Variable Selection and Data Extraction.

@article{Chen2018DoubleSK, title={Double Sparsity Kernel Learning with Automatic Variable Selection and Data Extraction.}, author={Jingxiang Chen and Chong Zhang and Michael R. Kosorok and Yufeng Liu}, journal={Statistics and its interface}, year={2018}, volume={11 3}, pages={ 401-420 } }

Learning in the Reproducing Kernel Hilbert Space (RKHS) has been widely used in many scientific disciplines. Because a RKHS can be very flexible, it is common to impose a regularization term in the optimization to prevent overfitting. Standard RKHS learning employs the squared norm penalty of the learning function. Despite its success, many challenges remain. In particular, one cannot directly use the squared norm penalty for variable selection or data extraction. Therefore, when there exists…

## Figures and Tables from this paper

## 5 Citations

A Tweedie Compound Poisson Model in Reproducing Kernel Hilbert Space

- Computer Science
- 2021

A kernel Tweedie model with integrated variable selection that provides much needed modeling flexibility and capability in ratemaking and loss-reserving in general insurance and is implemented in an efficient and user-friendly R package.

Group-based local adaptive deep multiple kernel learning with lp norm

- Computer SciencePloS one
- 2020

Experiments on UCI and Caltech 256 datasets demonstrate that the proposed GLDMKL method is more accurate in classification accuracy than other deep multiple kernel learning methods, especially for datasets with relatively complex data.

Distributed Generalized Cross-Validation for Divide-and-Conquer Kernel Ridge Regression and Its Asymptotic Optimality

- Computer ScienceJournal of Computational and Graphical Statistics
- 2019

A distributed generalized cross-validation (dGCV) is proposed as a data-driven tool for selecting the tuning parameters in d-KRR and shown to be asymptotically optimal in the sense that minimizing the dGCV score is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator.

Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring

- Computer ScienceAISTATS
- 2019

This work considers the two-group classification problem and proposes a kernel classifier based on the optimal scoring framework, which provides theoretical guarantees on the expected risk consistency of the method and allows for feature selection by imposing structured sparsity using weighted kernels.

Nonlinear Variable Selection via Deep Neural Networks

- Computer ScienceJ. Comput. Graph. Stat.
- 2021

A novel algorithm, called deep feature selection, is proposed to estimate both the sparse parameter and the other parameters in the selection layer and to establish the algorithm convergence and the selection consistency when the objective function has a generalized stable restricted Hessian.

## References

SHOWING 1-10 OF 43 REFERENCES

On Quantile Regression in Reproducing Kernel Hilbert Spaces with the Data Sparsity Constraint

- Computer ScienceJ. Mach. Learn. Res.
- 2016

It is demonstrated that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty.

Automatic Feature Selection via Weighted Kernels and Regularization

- Computer Science
- 2013

This article proposes to achieve feature selection by optimizing a simple criterion: a feature-regularized loss function, which is minimized by estimating the weights in conjunction with the coefficients of the original classification or regression problem, thereby automatically procuring a subset of important features.

A Selective Overview of Variable Selection in High Dimensional Feature Space.

- Computer ScienceStatistica Sinica
- 2010

A brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection is presented and the properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized.

Sure independence screening for ultrahigh dimensional feature space

- Computer Science
- 2006

The concept of sure screening is introduced and a sure screening method that is based on correlation learning, called sure independence screening, is proposed to reduce dimensionality from high to a moderate scale that is below the sample size.

Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties

- Mathematics, Computer Science
- 2001

In this article, penalized likelihood approaches are proposed to handle variable selection problems, and it is shown that the newly proposed estimators perform as well as the oracle procedure in variable selection; namely, they work as well if the correct submodel were known.

Fast rates for support vector machines using Gaussian kernels

- Computer Science
- 2007

This work uses concepts like Tsybakov’s noise assumption and local Rademacher averages to establish learning rates up to the order of n −1 for nontrivial distributions and introduces a geometric assumption for distributions that allows us to estimate the approximation properties of Gaussian RBF kernels.

Regularization and variable selection via the elastic net

- Computer Science
- 2005

It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

Component selection and smoothing in multivariate nonparametric regression

- Computer Science, Mathematics
- 2006

A detailed analysis reveals that the COSSO does model selection by applying a novel soft thresholding type operation to the function components, which leads naturally to an iterative algorithm.

Multiple Response Regression for Gaussian Mixture Models with Known Labels

- Computer ScienceStat. Anal. Data Min.
- 2012

It is demonstrated that both estimation and prediction can be improved by modeling all groups jointly using the proposed methods, which estimate the regression coefficient matrix, as well as the conditional inverse covariance matrix of response variables.

On Model Selection Consistency of Lasso

- Computer ScienceJ. Mach. Learn. Res.
- 2006

It is proved that a single condition, which is called the Irrepresentable Condition, is almost necessary and sufficient for Lasso to select the true model both in the classical fixed p setting and in the large p setting as the sample size n gets large.