• Publications
  • Influence
Pre-trained Models for Natural Language Processing: A Survey
This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks. Expand
Star-Transformer replaces the fully-connected structure with a star-shaped topology, in which every two non-adjacent nodes are connected through a shared relay node, and complexity is reduced from quadratic to linear, while preserving the capacity to capture both local composition and long-range dependency. Expand
Learning Sparse Sharing Architectures for Multiple Tasks
It is shown that both hard sharing and hierarchical sharing can be formulated as particular instances of the sparse sharing framework, and compared with single-task models and three typical multi-task learning baselines, the proposed approach achieves consistent improvement while requiring fewer parameters. Expand
CoLAKE: Contextualized Language and Knowledge Embedding
The Contextualized Language and Knowledge Embedding (CoLAKE) is proposed, which jointly learns contextualized representation for both language and knowledge with the extended MLM objective, and achieves surprisingly high performance on a synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language andknowledge representation. Expand
Accelerating BERT Inference for Sequence Labeling via Early-Exit
The extensive experiments on three popular sequence labeling tasks show that the proposed SENTEE approach can save up to 66%∼75% inference cost with minimal performance degradation, and can achieve better performance under the same speed-up ratios of 2×, 3×, and 4×. Expand
Generating Adversarial Examples in Chinese Texts Using Sentence-Pieces
A pre-train language model is proposed as the substitutes generator using sentence-pieces to craft adversarial examples in Chinese, which can mislead strong target models and remain fluent and semantically preserved. Expand