High-dimensional classification via nonparametric empirical Bayes and maximum likelihood inference

  title={High-dimensional classification via nonparametric empirical Bayes and maximum likelihood inference},
  author={Lee H. Dicker and Sihai Dave Zhao},
We propose new nonparametric empirical Bayes methods for high-dimensional classification. Our classifiers are designed to approximate the Bayes classifier in a hypothesized hierarchical model, where the prior distributions for the model parameters are estimated nonparametrically from the training data. As is common with nonparametric empirical Bayes, the proposed classifiers are effective in high-dimensional settings even when the underlying model parameters are in fact nonrandom. We use… 

Tables from this paper

High‐dimensional classification based on nonparametric maximum likelihood estimation under unknown and inhomogeneous variances
We propose a new method in high‐dimensional classification based on estimation of high‐dimensional mean vector under unknown and unequal variances. Our proposed method is based on a semi‐parametric
On the nonparametric maximum likelihood estimator for Gaussian location mixture densities with application to Gaussian denoising
The results imply, in particular, that every NPMLE achieves near parametric risk (up to logarithmic multiplicative factors) when the true density is a discrete Gaussian mixture without any prior information on the number of mixture components.
An Empirical Bayes Approach for High Dimensional Classification
We propose an empirical Bayes estimator based on Dirichlet process mixture model for estimating the sparse normalized mean difference, which could be directly applied to the high dimensional linear
Learning from a lot: Empirical Bayes in high-dimensional prediction settings
It is argued that empirical Bayes is particularly useful when the prior contains multiple parameters which model a priori information on variables, termed `co-data', and presented two novel examples that allow for co-data.
Learning from a lot: Empirical Bayes for high‐dimensional model‐based prediction
It is argued that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co‐data”, and presented two novel examples that allow for co‐data.
Multivariate, Heteroscedastic Empirical Bayes via Nonparametric Maximum Likelihood
An oracle inequality implying that the empirical Bayes estimator performs at nearly the optimal level (up to logarithmic factors) for denoising without prior knowledge is proved.
Hierarchical Bayes Modeling for Large-Scale Inference.
Bayesian modeling is now ubiquitous in problems of large-scale inference even when frequentist criteria are in mind for evaluating the performance of a procedure. By far most popular in the
Rebayes: an R package for empirical bayes mixture methods
An implementation contained in the R package REBayes is described with applications to a wide variety of mixture settings: Gaussian location and scale, Poisson and binomial mixtures for discrete data, Weibull and Gompertz models for survival data, and several Gaussian models intended for longitudinal data.
Self-regularizing Property of Nonparametric Maximum Likelihood Estimator in Mixture Models
It is shown that with high probability the NPMLE based on a sample of size n has $O(\log n)$ atoms (mass points), significantly improving the deterministic upper bound of $n$ due to Lindsay \cite{lindsay1983geometry1}.
REBayes : Empirical Bayes Mixture Methods in R
Models of unobserved heterogeneity, or frailty as it is commonly known in survival analysis, can often be formulated as semiparametric mixture models and estimated by maximum likelihood as proposed


Application of Non Parametric Empirical Bayes Estimation to High Dimensional Classification
The FAIR method improves the design of naive-Bayes classifiers in sparse setups, and shows that a good alternative to variable selection is estimation of the means through a certain non parametric empirical Bayes procedure.
A Direct Estimation Approach to Sparse Linear Discriminant Analysis
A simple and effective classifier is introduced by estimating the product Ωδ directly through constrained ℓ1 minimization and it has superior finite sample performance and significant computational advantages over the existing methods that require separate estimation of Ω and δ.
This work suggests an easily computed estimator μ, such that the ratio of its risk E(μ ― μ) 2 with that of the Bayes procedure approaches 1.
A direct approach to sparse discriminant analysis in ultra-high dimensions
The theory shows that the method proposed can consistently identify the subset of discriminative features contributing to the Bayes rule and at the same time consistently estimate theBayes classification direction, even when the dimension can grow faster than any polynomial order of the sample size.
General maximum likelihood empirical Bayes estimation of normal means
Simulation experiments demonstrate that the GMLEB outperforms the James―Stein and several state-of-the-art threshold estimators in a wide range of settings without much down side.
Variance estimation in high-dimensional linear models
The residual variance and the proportion of explained variation are important quantities in many statistical models and model fitting procedures. They play an important role in regression diagnostics
Penalized classification using Fisher's linear discriminant
  • D. Witten, R. Tibshirani
  • Computer Science
    Journal of the Royal Statistical Society. Series B, Statistical methodology
  • 2011
This work proposes penalized LDA, which is a general approach for penalizing the discriminant vectors in Fisher's discriminant problem in a way that leads to greater interpretability, and uses a minorization–maximization approach to optimize it efficiently when convex penalties are applied to the discriminating vectors.
CODA: high dimensional copula discriminant analysis
In high dimensional settings, it is proved that the sparsity pattern of the discriminant features can be consistently recovered with the parametric rate, and the expected misclassification error is consistent to the Bayes risk.
A road to classification in high dimensional space: the regularized optimal affine discriminant
A delicate result on continuous piecewise linear solution paths for the ROAD optimization problem at the population level justifies the linear interpolation of the constrained co‐ordinate descent algorithm.
Regression Shrinkage and Selection via the Lasso
A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.