Stephan Mandt

Learn More
Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word embeddings to other types of high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market basket(More)
Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic(More)
Transport properties are among the defining characteristics of many important phases in condensed-matter physics. In the presence of strong correlations they are difficult to predict, even for model systems such as the Hubbard model. In real materials, additional complications arise owing to impurities, lattice defects or multi-band effects. Ultracold atoms(More)
As highly tunable interacting systems, cold atoms in optical lattices are ideal to realize and observe negative absolute temperatures, T<0. We show theoretically that, by reversing the confining potential, stable superfluid condensates at finite momentum and T<0 can be created with low entropy production for attractive bosons. They may serve as "smoking(More)
Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD(More)
In order to avoid overfitting, it is common practice to regularize linear prediction models using squared or absolute-value norms of the model parameters. In our article we consider a new method of regularization: Huber-norm regularization imposes a combination of `1 and `2-norm regularization on the model parameters. We derive the dual optimization(More)