• Publications
  • Influence
ParsiNLU: A Suite of Language Understanding Challenges for Persian
TLDR
This work introduces ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on, and presents the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compares them with human performance.
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
TLDR
Evaluation of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters finds that model performance and calibration both improve with scale, but are poor in absolute terms.
Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation
TLDR
It is revealed that fine-tuning only the cross-attention parameters is nearly as effective as fine- Tuning all parameters (i.e., the entire translation model) in terms of mitigation of catastrophic forgetting, the potential for zero-shot translation, and the ability to extend machine translation models to several new language pairs with reduced parameter storage overhead.
A Universal Parent Model for Low-Resource Neural Machine Translation Transfer
TLDR
It is demonstrated that the approach, which leverages orthography unification and a broad-coverage approach to subword identification, generalizes well to several languages from a variety of families, and that translation systems built with the approach can be built more quickly than competing methods and with better quality as well.
On the Strengths of Cross-Attention in Pretrained Transformers for Machine Translation
TLDR
It is found that, apart from the new language’s embeddings, only the cross-attention parameters need to be fine-tuned to obtain competitive BLEU performance.
Unsupervised Product Entity Resolution using Graph Representation Learning
TLDR
Preliminary results on an unsupervised product ER system that is simple and extremely lightweight are reported, able to reduce mean rank reductions on some challenging product ER benchmarks by 50-70% compared to a text-only benchmark by leveraging a combination of text and neural graph embeddings.
Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-tuning
TLDR
This work shows that taking the ultimate choice of fine-tuning method into consideration boosts the performance of parameter-efficient flne- Tuning, and relies on optimization-based meta-learning using MAML with certain modi-cations for its distinct purpose.