Learn More
We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convo-lutional neural network (CNN) over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on(More)
Summarization based on text extraction is inherently limited, but generation-style ab-stractive methods have proven challenging to build. In this work, we propose a fully data-driven approach to abstrac-tive sentence summarization. Our method utilizes a local attention-based model that generates each word of the summary conditioned on the input sentence.(More)
This paper introduces algorithms for non-projective parsing based on dual decomposition. We focus on parsing algorithms for non-projective head automata, a generalization of head-automata models to non-projective structures. The dual decomposition algorithms are simple and efficient, relying on standard dynamic programming and minimum spanning tree(More)
This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamic-programming algorithms as oracle solvers for sub-problems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of(More)
ive Sentence Summarization generates a shorter version of a given sentence while attempting to preserve its meaning. We introduce a conditional recurrent neural network (RNN) which generates a summary of an input sentence. The conditioning is provided by a novel convolutional attention-based encoder which ensures that the decoder focuses on the appropriate(More)
We introduce a simple, non-linear mention-ranking model for coreference resolution that attempts to learn distinct feature representations for anaphoricity detection and antecedent ranking, which we encourage by pre-training on a pair of corresponding subtasks. Although we use only simple, unconjoined features, the model is able to learn useful(More)
There is compelling evidence that corefer-ence prediction would benefit from modeling global information about entity-clusters. Yet, state-of-the-art performance can be achieved with systems treating each mention prediction independently, which we attribute to the inherent difficulty of crafting informative cluster-level features. We instead propose to use(More)
1. Whole-cell and single-channel Na+ currents were recorded from small (ca. 20 micron diameter) cells isolated from adult rat dorsal root ganglia (DRG). Currents were classified by their sensitivity to 0.3 microM tetrodotoxin (TTX), electrophysiological properties and single-channel amplitude. Cells were classified according to the types of current recorded(More)
We describe an open-source toolkit for neural machine translation (NMT). The toolkit prioritizes efficiency, modularity, and extensibility with the goal of supporting NMT research into model architec-tures, feature representations, and source modalities, while maintaining competitive performance and reasonable training requirements. The toolkit consists of(More)
We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable sub-problems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97% of test examples; it has comparable speed(More)