• Publications
  • Influence
Multi-Task Deep Neural Networks for Natural Language Understanding
TLDR
A Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks that allows domain adaptation with substantially fewer in-domain labels than the pre-trained BERT representations.
On the Variance of the Adaptive Learning Rate and Beyond
TLDR
This work identifies a problem of the adaptive learning rate, suggests warmup works as a variance reduction technique, and proposes RAdam, a new variant of Adam, by introducing a term to rectify the variance of theadaptive learning rate.
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
A new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks that compares favorably with BERT on the GLUE benchmark, and the SQuAD 2.0 and CoQA question answering tasks.
RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers
TLDR
This work presents a unified framework, based on the relation-aware self-attention mechanism, to address schema encoding, schema linking, and feature representation within a text-to-SQL encoder and achieves the new state-of-the-art performance on the Spider leaderboard.
New approaches to H∞ controller designs based on fuzzy observers for T-S fuzzy systems via LMI
TLDR
Using the LMI technique, it is shown that the regulators, the fuzzy observers and the H∞ controller designs based on new observers for the T-S fuzzy systems are very practical and e8cient.
Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing
TLDR
A cyclical annealing schedule is proposed, which simply repeats the process of increasing \beta multiple times, and allows to learn more meaningful latent codes progressively by leveraging the results of previous learning cycles as warm re-restart.
SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
TLDR
A new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance and outperforms the state-of-the-art T5 model, which is the largest pre- trained model containing 11 billion parameters, on GLUE.
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
TLDR
It is shown that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
TLDR
This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks and shows that the distilled MT-dNN significantly outperforms the original MT- DNN on 7 out of 9 GLUE tasks.
The fuzzy sets and systems based on AFS.structure, EI algebra and EII algebra
TLDR
EI algebra and EII algebra, which are infinite distributive molecular lattices, and the AFS, which is a special system of Graver and Watkins (1977), are defined, which establish a totally new system of fuzzy sets and systems.
...
1
2
3
4
5
...