Learn More
It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across(More)
Minimum Error Rate Training (MERT) is an effective means to estimate the feature function weights of a linear model such that an automated evaluation criterion for measuring system performance can directly be optimized in training. To accomplish this, the training procedure determines for each feature function its exact error surface on a given set of(More)
We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subprob-lems that can be solved separately, thus making it trivially parallelizable. On the Stanford Natural Language Inference (SNLI) dataset, we obtain state-of-the-art results with almost an order of magnitude fewer parameters(More)
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection , enabled by an initial, low-quality batch translation. In contrast to other approaches which require specialized meta-data, the system uses only the textual content of the documents. Results are(More)
When translating among languages that differ substantially in word order, machine translation (MT) systems benefit from syntactic pre-ordering—an approach that uses features from a syntactic parse to permute source words into a target-language-like order. This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of(More)
In statistical language modeling, one technique to reduce the problematic effects of data spar-sity is to partition the vocabulary into equivalence classes. In this paper we investigate the effects of applying such a technique to higher-order n-gram models trained on large corpora. We introduce a modification of the exchange clustering algorithm with(More)
Temporal resolution systems are traditionally tuned to a particular language, requiring significant human effort to translate them to new languages. We present a language independent semantic parser for learning the interpretation of temporal phrases given only a corpus of utterances and the times they reference. We make use of a latent parse that encodes a(More)
Reading an article and answering questions about its content is a fundamental task for natural language understanding. While most successful neural approaches to this problem rely on recurrent neural networks (RNNs), training RNNs over long documents can be prohibitively slow. We present a novel framework for question answering that can efficiently scale to(More)