Optimal regularizations for data generation with probabilistic graphical models

  title={Optimal regularizations for data generation with probabilistic graphical models},
  author={Arnaud Fanthomme and Felipe B. Rizzato and Simona Cocco and R{\'e}mi Monasson},
  journal={Journal of Statistical Mechanics: Theory and Experiment},
Understanding the role of regularization is a central question in statistical inference. Empirically, well-chosen regularization schemes often dramatically improve the quality of the inferred models by avoiding overfitting of the training data. We consider here the particular case of L 2 regularization in the maximum a posteriori (MAP) inference of generative pairwise graphical models. Based on analytical calculations on Gaussian multivariate distributions and numerical experiments on Gaussian… 



ACE: adaptive cluster expansion for maximum entropy graphical model inference

The adaptive cluster expansion (ACE) method to quickly and accurately infer Ising or Potts models based on correlation data is described and it is shown that models inferred by ACE have substantially better statistical performance compared to those obtained from faster Gaussian and pseudo-likelihood methods.

Large pseudocounts and L2-norm penalties are necessary for the mean-field inference of Ising and Potts models.

It is argued, based on the analysis of small systems, that the optimal value of the regularization strength remains finite even if the sampling noise tends to zero, in order to correct for systematic biases introduced by the MF approximation.

Inference of compressed Potts graphical models.

A double regularization scheme, in which the number of Potts states (colors) available to each variable is reduced and interaction networks are made sparse, is studied, which shows in particular that color compression does not affect the quality of reconstruction of the parameters corresponding to high-frequency symbols, while drastically reducing thenumber of the other parameters and thus the computational time.

Generalisation error in learning with random features and the hidden manifold model

A closed-form expression for the asymptotic generalisation performance in generalised linear regression and classification for a synthetically generated dataset encompassing different problems of interest, such as learning with random features, neural networks in the lazy training regime, and the hidden manifold model is provided.

A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning

This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective and emphasizes the unique aspects that define the TOPML research area as a subfield of modern ML theory.

Deep learning: a statistical viewpoint

This article surveys recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings, and focuses specifically on the linear regime for neural networks, where the network can be approximated by a linear model.

Learning Sparse Neural Networks through L0 Regularization

A practical method for L_0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero, which allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way.

The role of regularization in classification of high-dimensional noisy Gaussian mixture

A rigorous analysis of the generalization error of regularized convex classifiers, including ridge, hinge and logistic regression, in the high-dimensional limit where the number of samples and their dimension goes to infinity while their ratio is fixed to $\alpha= n/d$.

Benign overfitting in linear regression

A characterization of linear regression problems for which the minimum norm interpolating prediction rule has near-optimal prediction accuracy shows that overparameterization is essential for benign overfitting in this setting: the number of directions in parameter space that are unimportant for prediction must significantly exceed the sample size.

High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence

The first result establishes consistency of the estimate b � in the elementwise maximum-norm, which allows us to derive convergence rates in Frobenius and spectral norms, and shows good correspondences between the theoretical predictions and behavior in simulations.