Stephan Mandt

Learn More
Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic(More)
Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word em-beddings to other types of high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market(More)
Variational inference (VI) combined with data subsampling enables approximate posterior inference with large data sets for otherwise intractable models, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This algorithm uses a temperature(More)
Stochastic variational inference (SVI) maps posterior inference in latent variable models to non-convex stochastic optimization. While they enable approximate posterior inference for many otherwise intractable models, variational inference methods suffer from local optima. We introduce deterministic annealing for SVI to overcome this issue. We introduce a(More)
Among the goals of statistical genetics is to find sparse associations of genetic data with binary phenotypes, such as heritable diseases. Often, the data are obfuscated by confounders such as age, ancestry, or population structure. A widely appreciated modeling paradigm which corrects for such confounding relies on linear mixed models. These are linear(More)
Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow practitioners to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models(More)
Probit regression and logistic regression are well-known models for classification. In contrast to logistic regression, probit regression has a canonical generalization that allows us to model correlations between the labels. This is a way to include metadata into the model that correlate the noisy observation process. We show that the approach leads to the(More)