Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence

  title={Fast incremental expectation maximization for finite-sum optimization: nonasymptotic convergence},
  author={Gersende Fort and P. Gach and {\'E}ric Moulines},
  journal={Statistics and Computing},
Fast incremental expectation maximization (FIEM) is a version of the EM framework for large datasets. In this paper, we first recast FIEM and other incremental EM type algorithms in the Stochastic Approximation within EM framework. Then, we provide nonasymptotic bounds for the convergence in expectation as a function of the number of examples n and of the maximal number of iterations Kmax\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage… 

Stochastic Variable Metric Proximal Gradient with variance reduction for non-convex composite optimization

This paper introduces a novel algorithm, the Perturbed Proximal Preconditioned SPIDER algorithm, which allows approximate preconditioned forward operator and uses a variable metric proximity operator as the backward operator; it also proposes a mini-batch strategy with variance reduction to address the finite sum setting.

Geom-Spider-EM: Faster Variance Reduced Stochastic Expectation Maximization for Nonconvex Finite-Sum Optimization

  • G. FortÉ. MoulinesHoi-To Wai
  • Computer Science
    ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2021
This paper proposes an extension of the Stochastic Path-Integrated Differential EstimatoR EM (SPIDER-EM) and derives complexity bounds for this novel algorithm, designed to solve smooth nonconvex finite-sum optimization problems.

An online Minorization-Maximization algorithm

It is shown that an online version of the Minorization–Maximization (MM) algorithm, which in-cludes the online EM algorithm as a special case, can be constructed in a similar manner.

The Perturbed Prox-Preconditioned Spider Algorithm: Non-Asymptotic Convergence Bounds

  • G. FortÉ. Moulines
  • Computer Science, Mathematics
    2021 IEEE Statistical Signal Processing Workshop (SSP)
  • 2021
A novel algorithm named PerturbedProx-Preconditioned SPIDER (3P-SPIDER) is introduced. It is a stochastic variancereduced proximal-gradient type algorithm built on Stochastic Path Integral

Federated Expectation Maximization with heterogeneity mitigation and variance reduction

FedEM is a new communication method, which handles partial participation of local devices, and is robust to heterogeneous distributions of the datasets, and develops and analyzes an extension of FedEM to further incorporate a variance reduction scheme.

The Perturbed Prox-Preconditioned Spider Algorithm for EM-Based Large Scale Learning

  • G. FortÉ. Moulines
  • Computer Science
    2021 IEEE Statistical Signal Processing Workshop (SSP)
  • 2021
The 3P-SPIDER algorithm addresses many intractabilities of the E-step of EM; it also deals with non-smooth regularization and convex constraint set and discusses the role of some design parameters.



Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

This work proposes a new stochastic gradient descent algorithm based on nested variance reduction that improves the best known gradient complexity of SVRG and the bestgradient complexity of SCSG.

Minimizing finite sums with the stochastic average gradient

Numerical experiments indicate that the new SAG method often dramatically outperforms existing SG and deterministic gradient methods, and that the performance may be further improved through the use of non-uniform sampling strategies.

Variance Reduction for Faster Non-Convex Optimization

This work considers the fundamental problem in non-convex optimization of efficiently reaching a stationary point, and proposes a first-order minibatch stochastic method that converges with an $O(1/\varepsilon)$ rate, and is faster than full gradient descent by $\Omega(n^{1/3})$.

On the Global Convergence of (Fast) Incremental Expectation Maximization Methods

This paper analyzes incremental and stochastic version of the EM algorithm as well as the variance reduced-version of [Chen et al., 2018] in a common unifying framework and establishes non-asymptotic convergence bounds for global convergence.

Convergence Theorems for Generalized Alternating Minimization Procedures

This work studies EM variants in which the E-step is not performed exactly, either to obtain improved rates of convergence, or due to approximations needed to compute statistics under a model family over which E-steps cannot be realized.

A Lower Bound for the Optimization of Finite Sums

A lower bound for optimizing a finite sum of n functions, where each function is L-smooth and the sum is µ-strongly convex is presented, and upper bounds for recently developed methods specializing to this setting are compared.

On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures

A simple rule is proposed for choosing the number of blocks with the IEM algorithm in the extreme case of one observation per block, which provides efficient updating formulas, which avoid the direct calculation of the inverses and determinants of the component-covariance matrices.

Generalized Majorization-Minimization

This work derives G-MM algorithms for several latent variable models and shows empirically that they consistently outperform their MM counterparts in optimizing non-convex objectives, and appears to be less sensitive to initialization.

Mini-batch learning of exponential family finite mixture models

It is demonstrated that the mini-batch algorithm for mixtures of normal distributions can outperform the standard EM algorithm, and a scheme for the stochastic stabilization of the constructedmini-batch algorithms is proposed.

Convergence of the Monte Carlo expectation maximization for curved exponential families

The Monte Carlo expectation maximization (MCEM) algorithm is a versatile tool for inference in incomplete data models, especially when used in combination with Markov chain Monte Carlo simulation