Share This Author
Unified Language Model Pre-training for Natural Language Understanding and Generation
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
This paper proposes a new learning method Oscar (Object-Semantics Aligned Pre-training), which uses object tags detected in images as anchor points to significantly ease the learning of alignments.
BEiT: BERT Pre-Training of Image Transformers
A self-supervised vision representation model BEIT, which stands for Bidirectional Encoder representation from Image Transformers, is introduced and Experimental results on image classification and semantic segmentation show that the model achieves competitive results with previous pre-training methods.
Language to Logical Form with Neural Attention
This paper presents a general method based on an attention-enhanced encoder-decoder model that encode input utterances into vector representations, and generate their logical forms by conditioning the output sequences or trees on the encoding vectors.
Adaptive Recursive Neural Network for Target-dependent Twitter Sentiment Classification
AdaRNN adaptively propagates the sentiments of words to target depending on the context and syntactic relationships between them and it is shown that AdaRNN improves the baseline methods.
Long Short-Term Memory-Networks for Machine Reading
A machine reading simulator which processes text incrementally from left to right and performs shallow reasoning with memory and attention and extends the Long Short-Term Memory architecture with a memory network in place of a single memory cell, offering a way to weakly induce relations among tokens.
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
- Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou
- Computer ScienceNeurIPS
- 25 February 2020
This work presents a simple and effective approach to compress large Transformer (Vaswani et al., 2017) based pre-trained models, termed as deep self-attention distillation, and demonstrates that the monolingual model outperforms state-of-the-art baselines in different parameter size of student models.
Data-to-Text Generation with Content Selection and Planning
This work presents a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training and shows that this model outperforms strong baselines improving the state-of-the-art on the recently released RotoWire dataset.
Coarse-to-Fine Decoding for Neural Semantic Parsing
This work proposes a structure-aware neural architecture which decomposes the semantic parsing process into two stages, and shows that this approach consistently improves performance, achieving competitive results despite the use of relatively simple decoders.
Question Answering over Freebase with Multi-Column Convolutional Neural Networks
This paper introduces multi-column convolutional neural networks (MCCNNs) to understand questions from three different aspects and learn their distributed representations and develops a method to compute the salience scores of question words in different column networks.