• Publications
  • Influence
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks. Expand
Gated Self-Matching Networks for Reading Comprehension and Question Answering
TLDR
The gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage, are presented and holds the first place on the SQuAD leaderboard for both single and ensemble model. Expand
Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification
TLDR
Three neural networks are developed to effectively incorporate the supervision from sentiment polarity of text (e.g. sentences or tweets) in their loss functions and the performance of SSWE is improved by concatenating SSWE with existing feature set. Expand
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
TLDR
A new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT), which adopts the simple yet powerful Transformer model as the backbone, and extends it to take both visual and linguistic embedded features as input. Expand
Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification
TLDR
AdaRNN adaptively propagates the sentiments of words to target depending on the context and syntactic relationships between them and it is shown that AdaRNN improves the baseline methods. Expand
Neural Question Generation from Text: A Preliminary Study
TLDR
A preliminary study on neural question generation from text with the SQuAD dataset is conducted, and the experiment results show that the method can produce fluent and diverse questions. Expand
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
TLDR
This paper proposes a new learning method Oscar (Object-Semantics Aligned Pre-training), which uses object tags detected in images as anchor points to significantly ease the learning of alignments. Expand
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
TLDR
This work proposes Hibert (as shorthand for HIerachical Bidirectional Encoder Representations from Transformers) for document encoding and a method to pre-train it using unlabeled data and achieves the state-of-the-art performance on these two datasets. Expand
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
TLDR
This work presents a simple and effective approach to compress large Transformer (Vaswani et al., 2017) based pre-trained models, termed as deep self-attention distillation, and demonstrates that the monolingual model outperforms state-of-the-art baselines in different parameter size of student models. Expand
Recognizing Named Entities in Tweets
TLDR
This work proposes to combine a K-Nearest Neighbors classifier with a linear Conditional Random Fields model under a semi-supervised learning framework to tackle the challenges of Named Entities Recognition for tweets. Expand
...
1
2
3
4
5
...