Stephan Mandt

Learn More
Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic(More)
Variational inference (VI) combined with data subsampling enables approximate posterior inference with large data sets for otherwise intractable models, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This algorithm uses a temperature(More)
Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word em-beddings to other types of high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market(More)
Among the goals of statistical genetics is to find sparse associations of genetic data with binary phenotypes, such as heritable diseases. Often, the data are obfuscated by confounders such as age, ancestry, or population structure. A widely appreciated modeling paradigm which corrects for such confounding relies on linear mixed models. These are linear(More)
Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow practitioners to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models(More)
Probit regression and logistic regression are well-known models for classification. In contrast to logistic regression, probit regression has a canonical generalization that allows us to model correlations between the labels. This is a way to include metadata into the model that correlate the noisy observation process. We show that the approach leads to the(More)
— We consider Hermitian N × N random matrices H distributed according to a probability density dµ N (H) = e −N Tr V (H) dH with analytic and uniformly convex V. From work by Zinn-Justin, Collins, and Guionnet & Maida it is known that the large-N limit of the scaled logarithm N −1 ln Ω(NK) of the Fourier transform Ω(K) of dµ N (H) for K of finite rank, is a(More)
  • 1