The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square

@article{Sur2019TheLR,
  title={The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square},
  author={Pragya Sur and Yuxin Chen and Emmanuel J. Cand{\`e}s},
  journal={Probability Theory and Related Fields},
  year={2019},
  pages={1-72}
}
Logistic regression is used thousands of times a day to fit data, predict future outcomes, and assess the statistical significance of explanatory variables. When used for the purpose of statistical inference, logistic models produce p-values for the regression coefficients by using an approximation to the distribution of the likelihood-ratio test (LRT). Indeed, Wilks’ theorem asserts that whenever we have a fixed number p of variables, twice the log-likelihood ratio (LLR) $$2 \Lambda $$2Λ is… 
A modern maximum-likelihood theory for high-dimensional logistic regression
  • P. Sur, E. Candès
  • Mathematics
    Proceedings of the National Academy of Sciences
  • 2019
TLDR
It is proved that the maximum-likelihood estimate (MLE) is biased, the variability of the MLE is far greater than classically estimated, and the likelihood-ratio test (LRT) is not distributed as a χ2.
The asymptotic distribution of the MLE in high-dimensional logistic models: Arbitrary covariance
We study the distribution of the maximum likelihood estimate (MLE) in high-dimensional logistic models, extending the recent results from Sur (2019) to the case where the Gaussian covariates may have
The Existence of Maximum Likelihood Estimate in High-Dimensional Generalized Linear Models with Binary Responses.
TLDR
It is established that the existence of the maximum likelihood estimate (MLE) exhibits a phase transition for a wide range of generalized linear models (GLMs) with binary responses and elliptical covariates.
The Impact of Regularization on High-dimensional Logistic Regression
TLDR
This paper studies regularized logistic regression (RLR), where a convex regularizer that encourages the desired structure is added to the negative of the log-likelihood function, and provides a precise analysis of the performance of RLR via the solution of a system of six nonlinear equations.
Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models
TLDR
Global testing and large-scale multiple testing for the regression coefficients are considered in both single- and two-regression settings and a lower bound for the global testing is established, which shows that the proposed test is asymptotically minimax optimal over some sparsity range.
Non-Asymptotic Behavior of the Maximum Likelihood Estimate of a Discrete Distribution
In this paper, we study the maximum likelihood estimate of the probability mass function (pmf) of $n$ independent and identically distributed (i.i.d.) random variables, in the non-asymptotic regime.
Moderate-Dimensional Inferences on Quadratic Functionals in Ordinary Least Squares
Statistical inferences for quadratic functionals of linear regression parameter have found wide applications including signal detection, global testing, inferences of error variance and fraction of
Replica analysis of overfitting in generalized linear regression models
TLDR
The results, illustrated by application to linear, logistic, and Cox regression, enable one to correct ML and MAP inferences in GLMs systematically for over!tting bias, and thus extend their applicability into the hitherto forbidden regime p=O(N).
The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression
This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase
Consistent Risk Estimation in High-Dimensional Linear Regression
TLDR
This paper studies the problem of risk estimation under the high-dimensional asymptotic setting, and proves the consistency of three risk estimates that have been successful in numerical studies, i.e., leave-one-out cross validation (LOOCV), approximate leave- one-out (ALO), and approximate message passing (AMP)-based techniques.
...
...

References

SHOWING 1-10 OF 83 REFERENCES
A modern maximum-likelihood theory for high-dimensional logistic regression
  • P. Sur, E. Candès
  • Mathematics
    Proceedings of the National Academy of Sciences
  • 2019
TLDR
It is proved that the maximum-likelihood estimate (MLE) is biased, the variability of the MLE is far greater than classically estimated, and the likelihood-ratio test (LRT) is not distributed as a χ2.
A high-dimensional Wilks phenomenon
A theorem by Wilks asserts that in smooth parametric density estimation the difference between the maximum likelihood and the likelihood of the sampling distribution converges toward a Chi-square
Generalized likelihood ratio statistics and Wilks phenomenon
TLDR
The generalized likelihood ratio statistics are shown to be general and powerful for nonparametric testing problems based on function estimation and can even be adaptively optimal in the sense of Spokoiny by using a simple choice of adaptive smoothing parameter.
Debiasing the lasso: Optimal sample size for Gaussian designs
TLDR
It is proved that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_0 = o(n/ (\log p)^2)$, and a new estimator that is minimax optimal up to a factor $1+o_n(1)$ for i.i.d. Gaussian designs.
Maximum likelihood estimation in logistic regression models with a diverging number of covariates
Binary data with high-dimensional covariates have become more and more common in many disciplines. In this paper we consider the maximum likelihood estimation for logistic regression models with a
High-dimensional Wilks phenomena in some exponential random graph models
TLDR
This paper shows the Wilks type of results for the Bradley-Terry model and the beta model when the number of parameters goes to infinity, and shows that the likelihood ratio test statistic \Lambda enjoys a chi-square approximation.
Asymptotic Behavior of Likelihood Methods for Exponential Families when the Number of Parameters Tends to Infinity
Consider a sample of size n from a regular exponential family in Pn dimensions. Let 6, denote the maximum likelihood estimator, and consider the case where Pn tends to infinity with n and where {(On}
Nonparametric inference with generalized likelihood ratio tests
Abstract The advance of technology facilitates the collection of statistical data. Flexible and refined statistical models are widely sought in a large array of statistical problems. The question
Penalized maximum likelihood estimation and effective dimension
This paper extends some prominent statistical results including \emph{Fisher Theorem and Wilks phenomenon} to the penalized maximum likelihood estimation with a quadratic penalization. It appears
On robust regression with high-dimensional predictors
TLDR
A nonlinear system of two deterministic equations that characterizes ρ is discovered that depends on ρ through proximal mappings of ρ as well as various aspects of the statistical model underlying this study.
...
...