• Publications
  • Influence
Frame-Semantic Parsing
A two-stage statistical model that takes lexical targets in their sentential contexts and predicts frame-semantic structures and results in qualitatively better structures than naïve local predictors, which outperforms the prior state of the art by significant margins. Expand
From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification
Sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities, is proposed, and an unexpected connection between this new loss and the Huber classification loss is revealed. Expand
Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
Fast, accurate, direct nonprojective dependency parsers with thirdorder features with parsing speeds competitive to projective parsers, with state-ofthe-art accuracies for the largest datasets (English, Czech, and German). Expand
Marian: Fast Neural Machine Translation in C++
Marian is an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs that can achieve high training and translation speed. Expand
An Augmented Lagrangian Approach to Constrained MAP Inference
This work proposes a new algorithm for approximate MAP inference on factor graphs, by combining augmented Lagrangian optimization with the dual decomposition method, which is provably convergent, parallelizable, and suitable for fine decompositions of the graph. Expand
Sparse Sequence-to-Sequence Models
Sparse sequence-to-sequence models are proposed, rooted in a new family of \alpha-entmax transformations, which includes softmax and sparsemax as particular cases, and is sparse for any \alpha > 1. Expand
Selective Attention for Context-aware Neural Machine Translation
This work proposes a novel and scalable top-down approach to hierarchical attention for context-aware NMT which uses sparse attention to selectively focus on relevant sentences in the document context and then attends to key words in those sentences. Expand
Adaptively Sparse Transformers
This work introduces the adaptively sparse Transformer, wherein attention heads have flexible, context-dependent sparsity patterns, accomplished by replacing softmax with alpha-entmax: a differentiable generalization of softmax that allows low-scoring words to receive precisely zero weight. Expand
Concise Integer Linear Programming Formulations for Dependency Parsing
This formulation of the problem of non-projective dependency parsing as a polynomial-sized integer linear program is able to handle non-local output features in an efficient manner and is compatible with prior knowledge encoded as hard constraints, and can also learn soft constraints from data. Expand
OpenKiwi: An Open Source Framework for Quality Estimation
We introduce OpenKiwi, a Pytorch-based open source framework for translation quality estimation. OpenKiwi supports training and testing of word-level and sentence-level quality estimation systems,Expand