• Publications
  • Influence
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
TLDR
This work expresses the self-attention as a linear dot-product of kernel feature maps and makes use of the associativity property of matrix products to reduce the complexity from O(N) to N, where N is the sequence length.
Document-Level Neural Machine Translation with Hierarchical Attention Networks
TLDR
Experiments show that hierarchical attention significantly improves the BLEU score over a strong NMT baseline with the state-of-the-art in context-aware methods, and that both the encoder and decoder benefit from context in complementary ways.
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology
TLDR
It is shown how this pipeline can be applied on a social multimedia platform for the creation of a large-scale multilingual visual sentiment concept ontology (MVSO), which is organized hierarchically by multilingual clusters of visually detectable nouns and subclusters of emotionally biased versions of these nouns.
Random Feature Attention
TLDR
RFA, a linear time and space attention that uses random feature methods to approximate the softmax function, is proposed and explored, showing that RFA is competitive in terms of both accuracy and efficiency on three long text classification datasets.
Sentiment analysis of user comments for one-class collaborative filtering over ted talks
TLDR
A sentiment-aware nearest neighbor model (SANN) for multimedia recommendations over TED talks, which makes use of user comments is proposed, which outperforms significantly, by more than 25% on unseen data, several competitive baselines.
Deep Encoder, Shallow Decoder: Reevaluating the Speed-Quality Tradeoff in Machine Translation
TLDR
The findings suggest that the latency disadvantage for autoregressive translation has been overestimated due to a suboptimal choice of layer allocation, and a new speed-quality baseline for future research toward fast, accurate translation is provided.
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
TLDR
The speed disadvantage for autoregressive baselines compared to non-autoregressive methods has been overestimated in three aspects: suboptimal layer allocation, insufficient speed measurement, and lack of knowledge distillation.
Explaining the Stars: Weighted Multiple-Instance Learning for Aspect-Based Sentiment Analysis
TLDR
A model of multiple-instance learning applied to the prediction of aspect ratings or judgments of specific properties of an item from user-contributed texts such as product reviews demonstrates interpretability and explanatory power for its predictions.
GILE: A Generalized Input-Label Embedding for Text Classification
TLDR
This paper proposes a new input-label model that generalizes over previous such models, addresses their limitations, and does not compromise performance on seen labels and outperforms monolingual and multilingual models that do not leverage label semantics and previous joint input- label space models in both scenarios.
Multilingual Hierarchical Attention Networks for Document Classification
TLDR
This work proposes multilingual hierarchical attention networks for learning document structures, with shared encoders and/or shared attention mechanisms across languages, using multi-task learning and an aligned semantic space as input, and evaluates the proposed models on multilingual document classification with disjoint label sets.
...
...