Nitish Shirish Keskar
Claim Your Author Page
Ensure your research is discoverable on Semantic Scholar. Claiming your author page allows you to personalize the information displayed and manage your publications. Semantic Scholar automatically creates author pages based on data aggregated from public sources and our publisher partners.
Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language… Continue Reading
The stochastic gradient descent (SGD) method and its variants are algorithms of choice for many Deep Learning tasks. These methods operate in a small-batch regime wherein a fraction of the training… Continue Reading
Presented on August 28, 2018 at 12:15 p.m. in the Pettit Microelectronics Research Center, Room 102 A/B.
Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and… Continue Reading
Large-scale language models show promising text generation capabilities, but users cannot easily control particular aspects of the generated text. We release CTRL, a 1.63 billion-parameter… Continue Reading
Text summarization aims at compressing long documents into a shorter form that conveys the most important parts of the original document. Despite increased interest in the community and notable… Continue Reading
Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic gradient descent (SGD). These methods… Continue Reading
State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et. al. (2017) propose a new architecture… Continue Reading
End-to-end neural models have made significant progress in question answering, however recent studies show that these models implicitly assume that the answer and evidence appear close together in a… Continue Reading
Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains including machine learning, robotics, and sensor… Continue Reading