• Corpus ID: 238419483

High Dimensional Logistic Regression Under Network Dependence

  title={High Dimensional Logistic Regression Under Network Dependence},
  author={Somabha Mukherjee and Sagnik Halder and Bhaswar B. Bhattacharya and George Michailidis},
Abstract. Logistic regression is one of the most fundamental methods for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure, such as over a temporal/spatial domain or on a social network. This necessitates the development of models that can simultaneously handle both the… 


Logistic-Regression with peer-group effects via inference in higher order Ising models
This work model binary outcomes on a network as a higher-order spin glass, where the behavior of an individual depends on a linear function of their own vector of covariates and some polynomial function of others, capturing peer-group effects.
Regression from dependent observations
This work presents computationally and statistically efficient methods for linear and logistic regression models when the response variables are dependent on a network, and proves strong consistency results for recovering the vector of coefficients and the strength of the dependencies.
High-dimensional Ising model selection using ℓ1-regularized logistic regression
It is proved that consistent neighborhood selection can be obtained for sample sizes $n=\Omega(d^3\log p)$ with exponentially decaying error, and when these same conditions are imposed directly on the sample matrices, it is shown that a reduced sample size suffices for the method to estimate neighborhoods consistently.
Optimal Single Sample Tests for Structured versus Unstructured Network Data
This work develops a new approach that applies to both the Ising and Exponential Random Graph settings based on a general and natural statistical test that can distinguish the hypotheses with high probability above a certain threshold in the (inverse) temperature parameter, and is optimal in that below the threshold no test can distinctions the hypotheses.
Hidden Markov Models and Disease Mapping
We present new methodology to extend hidden Markov models to the spatial domain, and use this class of models to analyze spatial heterogeneity of count data on a rare phenomenon. This situation
A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs
A semiparametric problem of two-sample hypothesis testing for a class of latent position random graphs is considered and a notion of consistency is formulated and a valid test is proposed for the hypothesis that two finite-dimensional random dot product graphs on a common vertex set have the same generating latent positions.
High-dimensional structure estimation in Ising models: Local separation criterion
A novel criterion for tractable graph families, where this method is efficient, based on the presence of sparse local separators between node pairs in the underlying graph, is introduced.
The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression
This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase
Nonconcave penalized composite conditional likelihood estimation of sparse Ising models
This work proposes efficient procedures for learning a sparse Ising model based on a penalized composite conditional likelihood with nonconcave penalties and demonstrates its finite sample performance via simulation studies and illustrated by studying the Human Immunodeficiency Virus type 1 protease structure.
The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square
It is proved that when p is not negligible compared to n, Wilks’ theorem does not hold and that the Chi-square approximation is grossly incorrect; in fact, this approximation produces p-values that are far too small (under the null hypothesis).