• Corpus ID: 198229385

Graph inference with clustering and false discovery rate control

@article{Rebafka2019GraphIW,
  title={Graph inference with clustering and false discovery rate control},
  author={Tabea Rebafka and {\'E}tienne Roquain and Fanny Villers},
  journal={arXiv: Statistics Theory},
  year={2019}
}
In this paper, a noisy version of the stochastic block model (NSBM) is introduced and we investigate the three following statistical inferences in this model: estimation of the model parameters, clustering of the nodes and identification of the underlying graph. While the two first inferences are done by using a variational expectation-maximization (VEM) algorithm, the graph inference is done by controlling the false discovery rate (FDR), that is, the average proportion of errors among the… 

Figures from this paper

On using empirical null distributions in Benjamini-Hochberg procedure.
TLDR
This work explores the issue of sparsity boundaries in the most classical case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and where the Benjamini-Hochberg (BH) procedure is applied after a data-rescaling step.
False discovery rate control with unknown null distribution: Is it possible to mimic the oracle?
Classical multiple testing theory prescribes the null distribution, which is often a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to
FALSE DISCOVERY RATE CONTROL WITH UNKNOWN NULL DISTRIBUTION: IS IT POSSIBLE TO MIMIC THE ORACLE? BY ETIENNE ROQUAIN
Classical multiple testing theory prescribes the null distribution, which is often too stringent an assumption for nowadays large scale experiments. This paper presents theoretical foundations to
Empirical Bayes cumulative $\ell$-value multiple testing procedure for sparse sequences
: In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the first time its behaviour from the frequentist point of view. Given a spike-and-slab
False clustering rate control in mixture models
TLDR
The purpose of this paper is to revisit this approach in an unsupervised mixture-model framework, formalized in terms of controlling the false clustering rate (FCR) below a prescribed level α, while maximizing the number of classified items.
False clustering rate in mixture models
TLDR
The purpose of this paper is to revisit this approach in an unsupervised mixture-model framework, formalized in terms of controlling the false clustering rate (FCR) below a prescribed level α, while maximizing the number of classified items.
Multiple Testing in Nonparametric Hidden Markov Models: An Empirical Bayes Approach
TLDR
A procedure is introduced, based on nonparametric empirical Bayes ideas, that controls the False Discovery Rate at a user–specified level and requires supremum–norm convergence of preliminary estimators of the emission densities of the HMM.
Post hoc false discovery proportion inference under a Hidden Markov Model
TLDR
A methodology to construct confidence bounds on the false discovery proportion (FDP), for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way, and a bootstrap-based methodology to take into account the effect of parameter estimation error.

References

SHOWING 1-10 OF 44 REFERENCES
Gaussian graphical model estimation with false discovery rate control
TLDR
This paper proposes a simultaneous testing procedure for conditional dependence in GGM by a multiple testing procedure that can control the false discovery rate (FDR) asymptotically and the numerical performance shows that the method works quite well.
Consistency and asymptotic normality of stochastic block models estimators from sampled data
TLDR
It is proved that maximum likelihood estimators and its variational approximations are consistent and asymptotically normal in the presence of missing data as soon as the sampling probability of $rho$ satisfies $\rho\gg\log(n)/n$.
A mixture model for random graphs
TLDR
The degree distribution and the clustering coefficient associated with this model are given, a variational method to estimate its parameters and a model selection criterion to select the number of classes are selected, which allows us to deal with large networks containing thousands of vertices.
The positive false discovery rate: a Bayesian interpretation and the q-value
TLDR
This work introduces a modified version of the FDR called the “positive false discovery rate” (pFDR), which can be written as a Bayesian posterior probability and can be connected to classification theory.
High-dimensional graphs and variable selection with the Lasso
TLDR
It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.
Parameter identifiability in a class of random graph mixture models
On the performance of FDR control: Constraints and a partial solution
The False Discovery Rate (FDR) paradigm aims to attain certain control on Type I errors with relatively high power for multiple hypothesis testing. The Benjamini–Hochberg (BH) procedure is a
THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY
Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR
Large-scale multiple testing under dependence
Summary.  The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying two‐state
Multiple testing under dependence via graphical models
TLDR
This work proposes a multiple testing procedure based on a Markov-randomfield-coupled mixture model, which is applied to a real-world genome-wide association study on breast cancer, and identifies several SNPs with strong association evidence.
...
1
2
3
4
5
...