# Graph inference with clustering and false discovery rate control

@article{Rebafka2019GraphIW, title={Graph inference with clustering and false discovery rate control}, author={Tabea Rebafka and {\'E}tienne Roquain and Fanny Villers}, journal={arXiv: Statistics Theory}, year={2019} }

In this paper, a noisy version of the stochastic block model (NSBM) is introduced and we investigate the three following statistical inferences in this model: estimation of the model parameters, clustering of the nodes and identification of the underlying graph. While the two first inferences are done by using a variational expectation-maximization (VEM) algorithm, the graph inference is done by controlling the false discovery rate (FDR), that is, the average proportion of errors among the…

## 8 Citations

On using empirical null distributions in Benjamini-Hochberg procedure.

- Mathematics, Computer Science
- 2019

This work explores the issue of sparsity boundaries in the most classical case where the null distributions are Gaussian with an unknown rescaling parameters (mean and variance) and where the Benjamini-Hochberg (BH) procedure is applied after a data-rescaling step.

False discovery rate control with unknown null distribution: Is it possible to mimic the oracle?

- MathematicsThe Annals of Statistics
- 2022

Classical multiple testing theory prescribes the null distribution, which is often a too stringent assumption for nowadays large scale experiments. This paper presents theoretical foundations to…

FALSE DISCOVERY RATE CONTROL WITH UNKNOWN NULL DISTRIBUTION: IS IT POSSIBLE TO MIMIC THE ORACLE? BY ETIENNE ROQUAIN

- Mathematics
- 2021

Classical multiple testing theory prescribes the null distribution, which is often too stringent an assumption for nowadays large scale experiments. This paper presents theoretical foundations to…

Empirical Bayes cumulative $\ell$-value multiple testing procedure for sparse sequences

- Mathematics
- 2021

: In the sparse sequence model, we consider a popular Bayesian multiple testing procedure and investigate for the ﬁrst time its behaviour from the frequentist point of view. Given a spike-and-slab…

False clustering rate control in mixture models

- Computer Science
- 2022

The purpose of this paper is to revisit this approach in an unsupervised mixture-model framework, formalized in terms of controlling the false clustering rate (FCR) below a prescribed level α, while maximizing the number of classified items.

False clustering rate in mixture models

- Computer Science
- 2022

The purpose of this paper is to revisit this approach in an unsupervised mixture-model framework, formalized in terms of controlling the false clustering rate (FCR) below a prescribed level α, while maximizing the number of classified items.

Multiple Testing in Nonparametric Hidden Markov Models: An Empirical Bayes Approach

- Mathematics, Computer Science
- 2021

A procedure is introduced, based on nonparametric empirical Bayes ideas, that controls the False Discovery Rate at a user–specified level and requires supremum–norm convergence of preliminary estimators of the emission densities of the HMM.

Post hoc false discovery proportion inference under a Hidden Markov Model

- Computer Science, Mathematics
- 2021

A methodology to construct confidence bounds on the false discovery proportion (FDP), for a user-selected set of hypotheses that can depend on the observed data in an arbitrary way, and a bootstrap-based methodology to take into account the effect of parameter estimation error.

## References

SHOWING 1-10 OF 44 REFERENCES

Gaussian graphical model estimation with false discovery rate control

- Computer Science
- 2013

This paper proposes a simultaneous testing procedure for conditional dependence in GGM by a multiple testing procedure that can control the false discovery rate (FDR) asymptotically and the numerical performance shows that the method works quite well.

Consistency and asymptotic normality of stochastic block models estimators from sampled data

- Mathematics, Computer Science
- 2019

It is proved that maximum likelihood estimators and its variational approximations are consistent and asymptotically normal in the presence of missing data as soon as the sampling probability of $rho$ satisfies $\rho\gg\log(n)/n$.

A mixture model for random graphs

- Computer Science, MathematicsStat. Comput.
- 2008

The degree distribution and the clustering coefficient associated with this model are given, a variational method to estimate its parameters and a model selection criterion to select the number of classes are selected, which allows us to deal with large networks containing thousands of vertices.

The positive false discovery rate: a Bayesian interpretation and the q-value

- Computer Science
- 2003

This work introduces a modified version of the FDR called the “positive false discovery rate” (pFDR), which can be written as a Bayesian posterior probability and can be connected to classification theory.

High-dimensional graphs and variable selection with the Lasso

- Computer Science
- 2006

It is shown that neighborhood selection with the Lasso is a computationally attractive alternative to standard covariance selection for sparse high-dimensional graphs and is hence equivalent to variable selection for Gaussian linear models.

Parameter identifiability in a class of random graph mixture models

- Mathematics, Computer Science
- 2010

On the performance of FDR control: Constraints and a partial solution

- Mathematics
- 2007

The False Discovery Rate (FDR) paradigm aims to attain certain control on Type I errors with relatively high power for multiple hypothesis testing. The Benjamini–Hochberg (BH) procedure is a…

THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY

- Mathematics
- 2001

Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR…

Large-scale multiple testing under dependence

- Mathematics
- 2009

Summary. The paper considers the problem of multiple testing under dependence in a compound decision theoretic framework. The observed data are assumed to be generated from an underlying two‐state…

Multiple testing under dependence via graphical models

- Computer Science
- 2016

This work proposes a multiple testing procedure based on a Markov-randomfield-coupled mixture model, which is applied to a real-world genome-wide association study on breast cancer, and identifies several SNPs with strong association evidence.