• Publications
  • Influence
Language Models for Image Captioning: The Quirks and What Works
TLDR
By combining key aspects of the ME and RNN methods, this paper achieves a new record performance over previously published results on the benchmark COCO dataset, however, the gains the authors see in BLEU do not translate to human judgments. Expand
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
TLDR
It is shown that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models. Expand
Bi-directional Attention with Agreement for Dependency Parsing
TLDR
A novel bi-directional attention model for dependency parsing, which learns to agree on headword predictions from the forward and backward parsing directions, which achieves state-of-the-art unlabeled attachment scores on 6 languages. Expand
Sounding Board: A User-Centric and Content-Driven Social Chatbot
TLDR
The system architecture consists of several components including spoken language processing, dialogue management, language generation, and content management, with emphasis on user-centric and content-driven design. Expand
Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering
TLDR
This work addresses the problem of extractive question answering using document-level distant super-vision, pairing questions and relevant documents with answer strings and demonstrates that a multi-objective model can efficiently combine the advantages of multiple assumptions and outperform the best individual formulation. Expand
Adversarial Training for Large Neural Language Models
TLDR
It is shown that adversarial pre-training can improve both generalization and robustness, and a general algorithm ALUM (Adversarial training for large neural LangUage Models), which regularizes the training objective by applying perturbations in the embedding space that maximizes the adversarial loss is proposed. Expand
Open-Domain Name Error Detection using a Multi-Task RNN
TLDR
A multi-task recurrent neural network language model for sentence-level name detection is proposed for use in combination with out-of-vocabulary word detection, which shows a 26% improvement in name-error detection F-score over a system using n-gram lexical features. Expand
Approximate Low-Rank Tensor Learning
Many real-world data arise naturally in the form of tensors, i.e. multi-dimensional arrays. Equipped with an appropriate notion of low-rankness, learning algorithms can benefit greatly fromExpand
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
TLDR
MT-DNN is presented, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models, designed to facilitate rapid customization for a broad spectrum of NLU tasks. Expand
Targeted Adversarial Training for Natural Language Understanding
TLDR
Experiments show that TAT can significantly improve accuracy over standard adversarial training on GLUE and attain new state-of-the-art zero-shot results on XNLI. Expand
...
1
2
3
...