This work proposes a method that represents the latent topical confounds and a model which “unlearns” confounding features by predicting both the label of the input text and the confound; but it shows that this model generalizes better and learns features that are indicative of the writing style rather than the content.
This work proposes a general technique for replacing the softmax layer with a continuous embedding layer, and introduces a novel probabilistic loss, and a training and inference procedure in which it generates a probability distribution over pre-trained word embeddings, instead of a multinomial distribution over the vocabulary obtained via softmax.
A novel framework for ASAG is introduced by cascading three neural building blocks: Siamese bidirectional LSTMs applied to a model and a student answer, a novel pooling layer based on earth-mover distance (EMD) across all hidden states from both L STMs, and a flexible final regression layer to output scores.
This work forms the decoding process as an optimization problem which allows for multiple attributes it aims to control to be easily incorporated as differentiable constraints to the optimization by relaxing this discrete optimization to a continuous one.
An end-to-end cross-lingual text summarization model that uses reinforcement learning to directly optimize a bilingual semantic similarity metric between the summaries generated in a target language and gold summaries in a source language is proposed.
This work proposes a general framework to rapidly adapt MT systems to generate language varieties that are close to, but different from, the standard target language, using no parallel (source–variety) data.
This work presents a hierarchical encoder based on structural attention to model such inter-sentence and inter-document dependencies within the sentences of a document and shows that the proposed model achieves significant improvement over the baseline in both single and multi-document summarization settings.
This work presents syn-margin loss, a novel margin-based loss that uses a synthetic negative sample constructed from only the predicted and target embeddings at every step, and finds that it provides small but significant improvements over both vMF and standard margin- based losses in continuous-output neural machine translation.
While this approach, without any pretraining is more stable while training and outperforms other GAN based approaches, it still falls behind MLE, it is found that this gap is due to autoregressive nature and architectural requirements for text generation as well as a fundamental difference between the definition of Wasserstein distance in image and text domains.
This work proposes a sampling procedure that combines the log-likelihood of the language model with arbitrary differentiable constraints into a single energy function; and generates samples by initializing the entire output sequence with noise and following a Markov chain defined by Langevin Dynamics using the gradients of this energy.