The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.
This work proposes a flexible method for training deep latent variable models of discrete structures based on the recently-proposed Wasserstein autoencoder (WAE), and shows that the latent representation can be trained to perform unaligned textual style transfer, giving improvements both in automatic/human evaluation compared to existing methods.
It is demonstrated that standard knowledge distillation applied to word-level prediction can be effective for NMT, and two novel sequence-level versions of knowledge distilling are introduced that further improve performance, and somewhat surprisingly, seem to eliminate the need for beam search.
A method for automatically detecting change in language across time through a chronologically trained neural language model that identifies words such as cell and gay as having changed during that time period.
A formalization of the grammar induction problem that models sentences as being generated by a compound probabilistic context free grammar, which is modulated by a per-sentence continuous latent variable, which induces marginal dependencies beyond the traditional context-free assumptions.
This work proposes a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them, which enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation.
An inference network parameterized as a neural CRF constituency parser is developed to maximize the evidence lower bound and apply amortized variational inference to unsupervised learning of RNNGs.
This work shows that structured attention networks are simple extensions of the basic attention procedure, and that they allow for extending attention beyond the standard soft-selection approach, such as attending to partial segmentations or to subtrees.
Compared to existing VAE architectures, it is shown that generative skip models maintain similar predictive performance but lead to less collapse and provide more meaningful representations of the data.