#### Filter Results:

- Full text PDF available (23)

#### Publication Year

2009

2017

- This year (7)
- Last 5 years (23)
- Last 10 years (26)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Key Phrases

Learn More

- Stephan Mandt, Matthew D. Hoffman, David M. Blei
- ICML
- 2016

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic… (More)

Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, a class of methods that extends the idea of word em-beddings to other types of high-dimensional data. As examples, we studied neural data with real-valued observations, count data from a market… (More)

- Stephan Mandt, David M. Blei
- NIPS
- 2014

Stochastic variational inference (SVI) lets us scale up Bayesian computation to massive data. It uses stochastic optimization to fit a variational distribution, following easy-to-compute noisy natural gradients. As with most traditional stochas-tic optimization methods, SVI takes precautions to use unbiased stochastic gradients whose expectations are equal… (More)

- Stephan Mandt, Matthew D. Hoffman, David M. Blei
- ArXiv
- 2017

Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD… (More)

- Stephan Mandt, James McInerney, Farhan Abrol, Rajesh Ranganath, David M. Blei
- AISTATS
- 2016

Variational inference (VI) combined with data subsampling enables approximate posterior inference with large data sets for otherwise intractable models, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This algorithm uses a temperature… (More)

- Ulrich Schneider, Lucia Hackermüller, +8 authors Achim Rosch
- 2012

Transport properties are among the defining characteristics of many important phases in condensed matter physics. In the presence of strong correlations they are difficult to predict even for model systems like the Hubbard model. In real materials they are in general obscured by additional complications including impurities, lattice defects or multi-band… (More)

- Akos Rapp, Stephan Mandt, Achim Rosch
- Physical review letters
- 2010

As highly tunable interacting systems, cold atoms in optical lattices are ideal to realize and observe negative absolute temperatures, T<0. We show theoretically that, by reversing the confining potential, stable superfluid condensates at finite momentum and T<0 can be created with low entropy production for attractive bosons. They may serve as "smoking… (More)

- Oleksandr Zadorozhnyi, Gunthard Benecke, Stephan Mandt, Tobias Scheffer, Marius Kloft
- ECML/PKDD
- 2016

In order to avoid overfitting, it is common practice to reg-ularize linear prediction models using squared or absolute-value norms of the model parameters. In our article we consider a new method of regularization: Huber-norm regularization imposes a combination of 1 and 2-norm regularization on the model parameters. We derive the dual optimization problem,… (More)

- Robert Bamler, Stephan Mandt
- ICML
- 2017

We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in time, the embedding vectors are inferred from a probabilistic version of word2vec (Mikolov et al., 2013b). These… (More)

Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that reaches a stationary distribution. We revisit an analysis of SGD in terms of stochastic differential equations in the limit of small constant gradient steps. This limit, which we feel is not appreciated in the… (More)