Stephan Mandt

Learn More
Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic(More)
Variational inference (VI) combined with data subsampling enables approximate posterior inference with large data sets for otherwise intractable models, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This algorithm uses a temperature(More)
Stochastic variational inference (SVI) maps posterior inference in latent variable models to non-convex stochastic optimization. While they enable approximate posterior inference for many otherwise intractable models, variational inference methods suffer from local optima. We introduce deterministic annealing for SVI to overcome this issue. We introduce a(More)
Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word em-beddings to other types of high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market(More)
Among the goals of statistical genetics is to find associations between genetic data and binary phenotypes, such as heritable diseases. Often, the data are obfuscated by confounders such as age, ethnicity, or population structure. Linear mixed models are linear regression models that correct for confounding by means of correlated label noise; they are(More)
Stochastic variational inference (SVI) enables approximate posterior inference with large data sets for otherwise intractable models, but like all vari-ational inference algorithms it suffers from local optima. Deterministic annealing, which we formulate here for the generic class of conditionally conjugate exponential family models, uses a temperature(More)
Probit regression and logistic regression are well-known models for classification. In contrast to logistic regression, probit regression has a canonical generalization that allows us to model correlations between the labels. This is a way to include metadata into the model that correlate the noisy observation process. We show that the approach leads to the(More)
  • 1