Learn More
Naturally-occurring instances of linguistic phenomena are important both for training and for evaluating automatic text processing. When available in large quantities, they also prove interesting material for linguistic studies. In this article, we present WiCoPaCo (Wikipedia Correction and Paraphrase Corpus), a new freely-available resource built by(More)
Using multi-layer neural networks to estimate the probabilities of word sequences is a promising research area in statistical language modeling, with applications in speech recognition and statistical machine translation. However, training such models for large vocabulary tasks is computationally challenging which does not scale easily to the huge corpora(More)
Extant Statistical Machine Translation (SMT) systems are very complex softwares, which embed multiple layers of heuristics and embark very large numbers of numerical parameters. As a result, it is difficult to analyze output translations and there is a real need for tools that could help developers to better understand the various causes of errors. In this(More)
In this paper, we present a straightforward strategy for transferring dependency parsers across languages. The proposed method learns a parser from partially annotated data obtained through the projection of annotations across unambiguous word alignments. It does not rely on any modeling of the reliability of dependency and/or alignment links and is(More)
The search space of Phrase-Based Statistical Machine Translation (PBSMT) systems can be represented under the form of a directed acyclic graph (lattice). The quality of this search space can thus be evaluated by computing the best achievable hypothesis in the lattice, the so-called oracle hypothesis. For common SMT metrics, this problem is however NP-hard(More)
Université Paris 6 LIP6 8 rue du capitaine Scott 75015 PARIS – France ABSTRACT Querying heterogeneous XML document collections is an open problem. This will require building some sort of correspondence between the DTD of the different sources. We consider here the problem of matching the structure of XML documents from different sources. We introduce for(More)
The widespread use of XML has urged the need to develop tools to efficiently store, access and organize XML corpus. The INEX initiative has resulted in major improvements in XML retrieval systems, but today, related tasks, like categorization or structure matching, should be investigated. We consider here the problem of clustering XML documents using their(More)
The neurotransmitter gamma-aminobutyric acid (GABA) appears to be involved in the control of gonadotropin secretion. These studies were conducted 1) to evaluate the effect of GABAergic drugs on in vitro LHRH secretion and 2) to characterize the role of different types of GABA receptors (the GABA-A and GABA-B subtypes) in these actions. Arcuate nuclei-median(More)
We present a novel translation quality informed procedure for both extraction and scoring of phrase pairs in PBSMT systems. We reformulate the extraction problem in the supervised learning framework in an attempt to take into account the translation quality, while incorporating arbitrary features in order to circumvent alignment errors. One-Class SVMs and(More)
In Machine Translation, it is customary to compute the model score of a predicted hypothesis as a linear combination of multiple features, where each feature assesses a particular facet of the hypothesis. The choice of a linear combination is usually justified by the possibility of efficient inference (decoding); yet, the appropriateness of this simple(More)