• Publications
  • Influence
Are NLP Models really able to Solve Simple Math Word Problems?
TLDR
It is shown that MWP solvers that do not have access to the question asked in the MWP can still solve a large fraction of MWPs, and models that treat MWPs as bag-of-words can also achieve surprisingly high accuracy.
Submodular Optimization-based Diverse Paraphrasing and its Effectiveness in Data Augmentation
TLDR
This work provides a novel formulation of the problem in terms of monotone submodular function maximization, specifically targeted towards the task of paraphrasing, and demonstrates the effectiveness of the method for data augmentation on multiple tasks such as intent classification and paraphrase recognition.
On the Ability and Limitations of Transformers to Recognize Formal Languages
TLDR
This work systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so, and provides insights on therole of self-attention mechanism in modeling certain behaviors and the influence of positional encoding schemes on the learning and generalization abilities.
On the Computational Power of Transformers and Its Implications in Sequence Modeling
TLDR
This paper provides an alternate and simpler proof to show that vanilla Transformers are Turing-complete and proves that Transformers with only positional masking and without any positional encoding are also Turing- complete.
On the Ability of Self-Attention Networks to Recognize Counter Languages
TLDR
This work systematically study the ability of Transformers to model such languages as well as the role of its individual components in doing so and the influence of positional encoding schemes on the learning and generalization ability of the model.
Unsung Challenges of Building and Deploying Language Technologies for Low Resource Language Communities
TLDR
This paper examines and analyzes the challenges associated with developing and introducing language technologies to low-resource language communities, and describes essential factors which the success of such technologies hinges upon.
On the Practical Ability of Recurrent Neural Networks to Recognize Hierarchical Languages
TLDR
This work studies the performance of recurrent models on Dyck-n languages, a particularly important and well-studied class of CFLs, and finds that while recurrent models generalize nearly perfectly if the lengths of the training and test strings are from the same range, they perform poorly if the teststrings are longer.
Revisiting the Compositional Generalization Abilities of Neural Sequence Models
TLDR
It is demonstrated that modifying the training distribution in simple and intuitive ways enables standard seq-to-seq models to achieve near-perfect generalization performance, thereby showing that their compositional generalization abilities were previously underestimated.