Yacine Jernite

Learn More
We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convo-lutional neural network (CNN) over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on(More)
Language modelling is a fundamental building block of natural language processing. However, in practice the size of the vocabulary limits the distributions applicable for this task: specifically , one has to either resort to local optimization methods, such as those used in neural language models, or work with heavily constrained distributions. In this(More)
We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data.(More)
Recurrent neural networks (RNNs) have been used extensively and with increasing success to model various types of sequential data. Much of this progress has been achieved through devising recurrent units and architectures with the flexibility to capture complex statistics in the data, such as long range dependency or localized attention phenomena. However,(More)
This paper addresses the problem of multi-class classification with an extremely large number of classes, where the class predic-tor is learned jointly with the data representation , as is the case in language modeling problems. The predictor admits a hierarchical structure, which allows for efficient handling of settings that deal with a very large number(More)
  • 1