#### Filter Results:

- Full text PDF available (13)

#### Publication Year

2014

2017

- This year (2)
- Last 5 years (13)
- Last 10 years (13)

#### Publication Type

#### Co-author

#### Journals and Conferences

#### Data Set Used

#### Key Phrases

Learn More

- Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
- ArXiv
- 2014

Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a… (More)

- Kyunghyun Cho, Bart van Merrienboer, +4 authors Yoshua Bengio
- EMNLP
- 2014

In this paper, we propose a novel neural network model called RNN Encoder– Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained… (More)

- Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio
- SSST@EMNLP
- 2014

Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this… (More)

- Rami Al-Rfou', Guillaume Alain, +109 authors Ying Zhang
- ArXiv
- 2016

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively… (More)

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks including machine translation, handwriting synthesis [1, 2] and image caption generation [3]. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of… (More)

- Dzmitry Bahdanau, Philemon Brakel, +5 authors Yoshua Bengio
- ArXiv
- 2016

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We… (More)

- Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio
- 2016 IEEE International Conference on Acoustics…
- 2016

Many state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) Systems are hybrids of neural networks and Hidden Markov Models (HMMs). Recently, more direct end-to-end methods have been investigated, in which neural architectures were trained to model sequences of characters [1,2]. To our knowledge, all these approaches relied on Connectionist… (More)

- Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
- ArXiv
- 2014

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder… (More)

- Dzmitry Bahdanau, Dmitriy Serdyuk, +4 authors Yoshua Bengio
- ArXiv
- 2015

Often, the performance on a supervised machine learning task is evaluated with a task loss function that cannot be optimized directly. Examples of such loss functions include the classification error, the edit distance and the BLEU score. A common workaround for this problem is to instead optimize a surrogate loss function, such as for instance… (More)

- Bart van Merrienboer, Dzmitry Bahdanau, +4 authors Yoshua Bengio
- ArXiv
- 2015

We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel. Blocks is based on Theano, a linear algebra compiler with CUDA-support (Bastien et al., 2012; Bergstra et al., 2010). It facilitates the training of complex neural network models by providing parametrized Theano operations, attaching metadata to Theano’s symbolic… (More)