Stan: A Probabilistic Programming Language.
- B. Carpenter, A. Gelman, A. Riddell
- Computer ScienceJournal of Statistical Software
- 11 January 2017
Stan is a probabilistic programming language for specifying statistical models that provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler and an adaptive form of Hamiltonian Monte Carlo sampling.
Stochastic variational inference
- M. Hoffman, D. Blei, Chong Wang, J. Paisley
- Computer ScienceJournal of machine learning research
- 29 June 2012
Stochastic variational inference lets us apply complex Bayesian models to massive data sets, and it is shown that the Bayesian nonparametric topic model outperforms its parametric counterpart.
The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo
- M. Hoffman, A. Gelman
- Computer ScienceJournal of machine learning research
- 18 November 2011
The No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps L, and derives a method for adapting the step size parameter {\epsilon} on the fly based on primal-dual averaging.
Online Learning for Latent Dirichlet Allocation
- M. Hoffman, D. Blei, F. Bach
- Computer ScienceNIPS
- 6 December 2010
An online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA) based on online stochastic optimization with a natural gradient step is developed, which shows converges to a local optimum of the VB objective function.
Variational Autoencoders for Collaborative Filtering
- Dawen Liang, R. G. Krishnan, M. Hoffman, T. Jebara
- Computer ScienceThe Web Conference
- 16 February 2018
A generative model with multinomial likelihood and use Bayesian inference for parameter estimation is introduced and the pros and cons of employing a principledBayesian inference approach are identified and characterize settings where it provides the most significant improvements.
Stochastic Gradient Descent as Approximate Bayesian Inference
- S. Mandt, M. Hoffman, D. Blei
- Computer ScienceJournal of machine learning research
- 13 April 2017
It is demonstrated that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models and a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler is proposed.
Learning Activation Functions to Improve Deep Neural Networks
- Forest Agostinelli, M. Hoffman, Peter Sadowski, P. Baldi
- Computer ScienceInternational Conference on Learning…
- 21 December 2014
A novel form of piecewise linear activation function that is learned independently for each neuron using gradient descent is designed, achieving state-of-the-art performance on CIFar-10, CIFAR-100, and a benchmark from high-energy physics involving Higgs boson decay modes.
Music Transformer: Generating Music with Long-Term Structure
- Cheng-Zhi Anna Huang, Ashish Vaswani, D. Eck
- Computer ScienceInternational Conference on Learning…
- 2019
It is demonstrated that a Transformer with the modified relative attention mechanism can generate minutelong compositions with compelling structure, generate continuations that coherently elaborate on a given motif, and in a seq2seq setup generate accompaniments conditioned on melodies.
Sparse stochastic inference for latent Dirichlet allocation
- David Mimno, M. Hoffman, D. Blei
- Computer ScienceInternational Conference on Machine Learning
- 26 June 2012
A hybrid algorithm for Bayesian topic models that combines the efficiency of sparse Gibbs sampling with the scalability of online stochastic inference is presented that reduces the bias of variational inference and generalizes to many Bayesian hidden-variable models.
Nonparametric variational inference
- S. Gershman, M. Hoffman, D. Blei
- Computer ScienceInternational Conference on Machine Learning
- 18 June 2012
The efficacy of the nonparametric approximation with a hierarchical logistic regression model and a nonlinear matrix factorization model is demonstrated and it is obtained predictive performance as good as or better than more specialized variational methods and MCMC approximations.
...
...