#### Filter Results:

- Full text PDF available (12)

#### Publication Year

2008

2017

#### Publication Type

#### Co-author

#### Publication Venue

#### Data Set Used

#### Key Phrases

Learn More

- Rafal Józefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, Yonghui Wu
- ArXiv
- 2016

In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as… (More)

- Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer
- NIPS
- 2015

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image cap-tioning. The current approach to training them consists in maximizing the likelihood of each token in the sequence given the current (recurrent) state and the previous token. At inference, the… (More)

- Noam Shazeer, Ryan Doherty, Colin Evans, Chris Waterson
- ArXiv
- 2016

We present Submatrix-wise Vector Embedding Learner (Swivel), a method for generating low-dimensional feature embeddings from a feature co-occurrence matrix. Swivel performs approximate factorization of the point-wise mutual information matrix via stochastic gradient descent. It uses a piecewise loss with special handling for unobserved co-occurrences, and… (More)

- Georg Heigold, Ignacio Moreno, Samy Bengio, Noam Shazeer
- 2016 IEEE International Conference on Acoustics…
- 2016

In this paper we present a data-driven, integrated approach to speaker verification, which maps a test utterance and a few reference utterances directly to a single score for verification and jointly optimizes the system's components using the same evaluation protocol and metric as at test time. Such an approach will result in simple and efficient systems,… (More)

- Noam Shazeer, Joris Pelemans, Ciprian Chelba
- INTERSPEECH
- 2015

- Noam Shazeer, Azalia Mirhoseini, +4 authors Jeff Dean
- ArXiv
- 2017

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant… (More)

- Noam Shazeer, Joris Pelemans, Ciprian Chelba
- ArXiv
- 2014

We present a novel family of language model (LM) estimation techniques named Sparse Non-negative Matrix (SNM) estimation. A first set of experiments empirically evaluating it on the One Billion Word Benchmark [Chelba et al., 2013] shows that SNM n-gram LMs perform almost as well as the well-established Kneser-Ney (KN) models. When using skip-gram features… (More)

- Joris Pelemans, Noam Shazeer, Ciprian Chelba
- TACL
- 2016

We present Sparse Non-negative Matrix (SNM) estimation, a novel probability estimation technique for language modeling that can efficiently incorporate arbitrary features. We evaluate SNM language models on two corpora: the One Billion Word Benchmark and a subset of the LDC English Gigaword corpus. Results show that SNM language models trained with n-gram… (More)

- Ciprian Chelba, Noam Shazeer
- ASRU
- 2015

- Georges Harik, Noam Shazeer
- ArXiv
- 2008

We introduce a framework for representing a variety of interesting problems as inference over the execution of probabilistic model programs. We represent a " solution " to such a problem as a guide program which runs alongside the model program and influences the model program's random choices, leading the model program to sample from a different… (More)