# Firefly Monte Carlo: Exact MCMC with Subsets of Data

@inproceedings{Maclaurin2014FireflyMC, title={Firefly Monte Carlo: Exact MCMC with Subsets of Data}, author={Dougal Maclaurin and Ryan P. Adams}, booktitle={Conference on Uncertainty in Artificial Intelligence}, year={2014} }

Markov chain Monte Carlo (MCMC) is a popular tool for Bayesian inference. However, MCMC cannot be practically applied to large data sets because of the prohibitive cost of evaluating every likelihood term at every iteration. Here we present Firefly Monte Carlo (FlyMC) MCMC algorithm with auxiliary variables that only queries the likelihoods of a subset of the data at each iteration yet simulates from the exact posterior distribution. FlyMC is compatible with modern MCMC algorithms, and only…

## 162 Citations

### An algorithm for distributed Bayesian inference

- Computer ScienceStat
- 2021

A scalable extension of Monte Carlo algorithms using the divide‐and‐conquer (D&C) technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using a powered likelihood and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset.

### No Free Lunch for Approximate MCMC

- Computer Science
- 2020

It is pointed out that well-known MCMC convergence results often imply that these "subsampling" MCMC algorithms cannot greatly improve performance, and generic results are applied to realistic statistical problems and proposed algorithms.

### An Approximate MCMC Method for Convex Hulls

- Computer Science
- 2019

The initial work in this thesis is to define a data-augmentation algorithm along the lines of FLYMC, which uses pseudo-marginal algorithm (PMMH) to replace interest parameter’s distribution conditional on augmented variable by an estimator and introduces an auxiliary random variable to mark subsets.

### An Algorithm for Distributed Bayesian Inference in Generalized Linear Models

- Computer Science
- 2019

This work develops a scalable extension of Monte Carlo algorithms using the divide-and-conquer technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using a \textit{powered} likelihood, and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset.

### Parallelizing MCMC with Random Partition Trees

- Computer ScienceNIPS
- 2015

A new EP-MCMC algorithm PART is proposed that applies random partition trees to combine the subset posterior draws, which is distribution-free, easy to re-sample from and can adapt to multiple scales.

### Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets

- Computer ScienceICML
- 2019

The Scalable Metropolis-Hastings (SMH) kernel is proposed, a kernel that exploits Gaussian concentration of the posterior to require processing on average only $O(1)$ or even $O (1/\sqrt{n})$ data points per step.

### Variational Consensus Monte Carlo

- Computer ScienceNIPS
- 2015

Practitioners of Bayesian statistics have long depended on Markov chain Monte Carlo (MCMC) to obtain samples from intractable posterior distributions. Unfortunately, MCMC algorithms are typically…

### On Markov chain Monte Carlo methods for tall data

- Computer ScienceJ. Mach. Learn. Res.
- 2017

An original subsampling-based approach is proposed which samples from a distribution provably close to the posterior distribution of interest, yet can require less than $O(n)$ data point likelihood evaluations at each iteration for certain statistical models in favourable scenarios.

### Speeding Up MCMC by Efficient Data Subsampling

- Computer Science, MathematicsJournal of the American Statistical Association
- 2018

Subsampling Markov chain Monte Carlo is substantially more efficient than standard MCMC in terms of sampling efficiency for a given computational budget, and that it outperforms other subsampling methods for MCMC proposed in the literature.

### Comparing consensus Monte Carlo strategies for distributed Bayesian computation

- Computer Science
- 2017

It is found that resampling and kernel density based methods break down after 10 or sometimes fewer dimensions, while the new mixture-based approach works well, but the necessary mixture models take too long to take place.

## References

SHOWING 1-10 OF 26 REFERENCES

### CODA: convergence diagnosis and output analysis for MCMC

- Computer Science
- 2006

Bayesian inference with Markov Chain Monte Carlo with coda package for R contains a set of functions designed to help the user answer questions about how many samples are required to accurately estimate posterior quantities of interest.

### Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

- Computer ScienceICML 2014
- 2013

This work introduces an approximate MH rule based on a sequential hypothesis test that allows us to accept or reject samples with high confidence using only a fraction of the data required for the exact MH rule.

### The pseudo-marginal approach for efficient Monte Carlo computations

- Computer Science
- 2009

A powerful and flexible MCMC algorithm for stochastic simulation that builds on a pseudo-marginal method, showing how algorithms which are approximations to an idealized marginal algorithm, can share the same marginal stationary distribution as the idealized method.

### Bayesian Learning via Stochastic Gradient Langevin Dynamics

- Computer ScienceICML
- 2011

In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic…

### Black Box Variational Inference

- Computer ScienceAISTATS
- 2014

This paper presents a "black box" variational inference algorithm, one that can be quickly applied to many models with little additional derivation, based on a stochastic optimization of the variational objective where the noisy gradient is computed from Monte Carlo samples from the Variational distribution.

### Stochastic variational inference

- Computer ScienceJ. Mach. Learn. Res.
- 2013

Stochastic variational inference lets us apply complex Bayesian models to massive data sets, and it is shown that the Bayesian nonparametric topic model outperforms its parametric counterpart.

### Slice Sampling

- MathematicsThe Annals of Statistics
- 2003

Markov chain sampling methods that adapt to characteristics of the distribution being sampled can be constructed using the principle that one can ample from a distribution by sampling uniformly from…

### Accelerating MCMC via Parallel Predictive Prefetching

- Computer ScienceUAI
- 2014

This work speculatively evaluates many potential steps of an MCMC chain in parallel while exploiting fast, iterative approximations to the target density, and achieves speedup close to linear in the number of available cores.

### Optimal scaling of discrete approximations to Langevin diffusions

- Computer Science, Mathematics
- 1998

An asymptotic diffusion limit theorem is proved and it is shown that, as a function of dimension n, the complexity of the algorithm is O(n1/3), which compares favourably with the O- complexity of random walk Metropolis algorithms.

### Weak convergence and optimal scaling of random walk Metropolis algorithms

- Mathematics
- 1997

This paper considers the problem of scaling the proposal distribution of a multidimensional random walk Metropolis algorithm in order to maximize the efficiency of the algorithm. The main result is a…