• Publications
  • Influence
Multi-Task Deep Neural Networks for Natural Language Understanding
TLDR
We present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. Expand
  • 348
  • 80
  • PDF
On the Variance of the Adaptive Learning Rate and Beyond
TLDR
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam. Expand
  • 322
  • 60
  • PDF
Unified Language Model Pre-training for Natural Language Understanding and Generation
TLDR
This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. Expand
  • 265
  • 47
  • PDF
New approaches to H∞ controller designs based on fuzzy observers for T-S fuzzy systems via LMI
TLDR
The problems of relaxed quadratic stability conditions, fuzzy observer designs and H∞ controller designs for T-S fuzzy systems have been studied. Expand
  • 271
  • 31
Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing
TLDR
We study different scheduling schemes for 𝛽, and show that KL vanishing is caused by the lack of good latent codes in training decoder at the beginning of optimization. Expand
  • 63
  • 20
  • PDF
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
TLDR
This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks. Expand
  • 64
  • 16
  • PDF
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading
TLDR
We present a new end-to-end approach to contentful neural conversation that jointly models response generation and on-demand machine reading. Expand
  • 43
  • 6
  • PDF
RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers
TLDR
We present a unified framework, based on the relation-aware self-attention mechanism, to address schema encoding, schema linking, and feature representation within a text-to-SQL encoder. Expand
  • 26
  • 6
  • PDF
Identification and Efficient Estimation of Simultaneous Equations Network Models
This article considers identification and estimation of social network models in a system of simultaneous equations. We show that, with or without row-normalization of the social adjacency matrix,Expand
  • 31
  • 6
Novel artificial intelligent techniques via AFS theory: Feature selection, concept categorization and characteristic description
TLDR
We propose Axiomatic Fuzzy Set (AFS) theory, in which fuzzy sets (membership functions) and their logic operations are determined by a consistent algorithm according to the distributions of original data and the semantics of the fuzzy concepts, is applied to study some new techniques of feature selection, concept categorization and characteristic description. Expand
  • 24
  • 5