• Publications
  • Influence
Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation
Two different integration strategies for MWE inSMT are proposed, which take advantage of different degrees of MWE semantic compositionality and yield complementary improvements in SMT quality on a large-scale translation task.
Improving Statistical Machine Translation Using Word Sense Disambiguation
This paper investigates a new strategy for integrating WSD into an SMT system, that performs fully phrasal multi-word disambiguation, and provides the first known empirical evidence that lexical semantics are indeed useful for SMT, despite claims to the contrary.
Word Sense Disambiguation vs. Statistical Machine Translation
It is found that word sense disambiguation does not yield significantly better translation quality than the statistical machine translation system alone.
Multi-Task Neural Models for Translating Between Styles Within and Across Languages
This work proposes to solve two related tasks on generating text of varying formality: monolingual formality transfer and formality-sensitive machine translation jointly using multi-task learning, and shows that the models achieve state-of-the-art performance for formalitytransfer and are able to perform formality -sensitive translation without being explicitly trained on style-annotated translation examples.
The NRC System for Discriminating Similar Languages
This work describes the system built by the National Research Council Canada for the ”Discriminating between similar languages” (DSL) shared task, which uses various statistical classifiers and makes predictions based on a two-stage process to reach the best performance among all systems submitted to the open and closed tasks.
An Empirical Exploration of Curriculum Learning for Neural Machine Translation
A probabilistic view of curriculum learning is adopted, which lets us flexibly evaluate the impact of curricula design, and an extensive exploration on a German-English translation task shows it is possible to improve convergence time at no loss in translation quality.
SemEval-2016 Task 10: Detecting Minimal Semantic Units and their Meanings (DiMSUM)
This task combines the labeling of multiword expressions and supersenses (coarse-grained classes) in an explicit, yet broad-coverage paradigm for lexical semantics in a multi-domain evaluation setting, indicating that the task remains largely unresolved.
Curriculum Learning for Domain Adaptation in Neural Machine Translation
This work introduces a curriculum learning approach to adapt generic neural machine translation models to a specific domain and consistently outperforms both unadapted and adapted baselines in experiments with two distinct domains and two language pairs.
A Stacked, Voted, Stacked Model for Named Entity Recognition
This paper investigates stacking and voting methods for combining strong classifiers like boosting, SVM, and TBL, on the named-entity recognition task. We demonstrate several effective approaches,
Measuring Machine Translation Errors in New Domains
We develop two techniques for analyzing the effect of porting a machine translation system to a new domain. One is a macro-level analysis that measures how domain shift affects corpus-level