• Corpus ID: 245502263

Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces

@article{Yuan2021OptimalMA,
  title={Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces},
  author={Chaoxia Yuan and Chao Ying and Zhou Yu and Fang Fang},
  journal={ArXiv},
  year={2021},
  volume={abs/2112.12961}
}
Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 46 REFERENCES

Variable selection for support vector machines in moderately high dimensions

It is proved that, in ultrahigh dimensions, there is one local minimizer to the objective function of non‐convex penalized SVMs having the desired oracle property and the local linear approximation algorithm is guaranteed to converge to the oracle estimator even in the ultrahigh dimensional setting if an appropriate initial estimator is available.

A Consistent Information Criterion for Support Vector Machines in Diverging Model Spaces

It is shown that a modification of the support vector machine information criterion achieves model selection consistency even when the number of features diverges at an exponential rate of the sample size, which can be further applied to selecting the optimal tuning parameter for various penalized support vectors machine methods.

A Model-Averaging Approach for High-Dimensional Regression

A model-averaging procedure for high-dimensional regression problems in which the number of predictors p exceeds the sample size n is developed, and a theorem is proved, showing that delete-one cross-validation achieves the lowest possible prediction loss asymptotically.

A weight-relaxed model averaging approach for high-dimensional generalized linear models

A general result is established to show the existence of pseudo-true regression parameters under model misspecification, and proper conditions for the leave-one-out cross-validation weight selection to achieve asymptotic optimality are derived.

Statistical performance of support vector machines

The main result shows that it is possible to obtain fast rates of convergence for SVMs and builds on the observation made by other authors that the SVM can be viewed as a statistical regularization procedure.

Parsimonious Model Averaging With a Diverging Number of Parameters

It is proved that the proposed procedure is asymptotically optimal in the sense that its squared prediction loss and risk are asymPTotically identical to those of the best—but infeasible—model averaging estimator.

Jackknife model averaging for high-dimensional quantile regression.

A frequentist model averaging method for quantile regression with high dimensional covariates, using a delete-one cross-validation method to select the model weights, and proving that the resultant estimator possesses an optimal asymptotic property uniformly over any compact (0,1) subset of the quantile indices.

Probability estimation for large-margin classifiers

A novel method for estimating the class probability through sequential classifications, by using features of interval estimation of large-margin classifiers, which is highly competitive against alternatives, especially when the dimension of the input greatly exceeds the sample size.

Statistical behavior and consistency of classification methods based on convex risk minimization

This study sheds light on the good performance of some recently proposed linear classification methods including boosting and support vector machines and shows their limitations and suggests possible improvements.

Support Vector Machines with a Reject Option

The problem of binary classification where the classifier may abstain instead of classifying each observation is considered, and the double hinge loss function that focuses on estimating conditional probabilities only in the vicinity of the threshold points of the optimal decision rule is derived.