- Full text PDF available (16)
- This year (4)
- Last 5 years (9)
- Last 10 years (15)
Data Set Used
It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across… (More)
Minimum Error Rate Training (MERT) is an effective means to estimate the feature function weights of a linear model such that an automated evaluation criterion for measuring system performance can directly be optimized in training. To accomplish this, the training procedure determines for each feature function its exact error surface on a given set of… (More)
We propose a simple neural architecture for natural language inference. Our approach uses attention to decompose the problem into subprob-lems that can be solved separately, thus making it trivially parallelizable. On the Stanford Natural Language Inference (SNLI) dataset, we obtain state-of-the-art results with almost an order of magnitude fewer parameters… (More)
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection , enabled by an initial, low-quality batch translation. In contrast to other approaches which require specialized meta-data, the system uses only the textual content of the documents. Results are… (More)
When translating among languages that differ substantially in word order, machine translation (MT) systems benefit from syntactic pre-ordering—an approach that uses features from a syntactic parse to permute source words into a target-language-like order. This paper presents a method for inducing parse trees automatically from a parallel corpus, instead of… (More)
In statistical language modeling, one technique to reduce the problematic effects of data spar-sity is to partition the vocabulary into equivalence classes. In this paper we investigate the effects of applying such a technique to higher-order n-gram models trained on large corpora. We introduce a modification of the exchange clustering algorithm with… (More)
We present LLCCM, a log-linear variant of the constituent context model (CCM) of grammar induction. LLCCM retains the simplicity of the original CCM but extends robustly to long sentences. On sentences of up to length 40, LLCCM outperforms CCM by 13.9% bracketing F1 and outperforms a right-branching baseline in regimes where CCM does not.
Temporal resolution systems are traditionally tuned to a particular language, requiring significant human effort to translate them to new languages. We present a language independent semantic parser for learning the interpretation of temporal phrases given only a corpus of utterances and the times they reference. We make use of a latent parse that encodes a… (More)
Reading an article and answering questions about its content is a fundamental task for natural language understanding. While most successful neural approaches to this problem rely on recurrent neural networks (RNNs), training RNNs over long documents can be prohibitively slow. We present a novel framework for question answering that can efficiently scale to… (More)
As a prerequisite to translation of poetry, we implement the ability to produce translations with meter and rhyme for phrase-based MT, examine whether the hypothesis space of such a system is flexible enough to accomodate such constraints, and investigate the impact of such constraints on translation quality.