• Publications
  • Influence
Pre-trained Models for Natural Language Processing: A Survey
TLDR
This survey is purposed to be a hands-on guide for understanding, using, and developing PTMs for various NLP tasks. Expand
Learning Sparse Sharing Architectures for Multiple Tasks
TLDR
It is shown that both hard sharing and hierarchical sharing can be formulated as particular instances of the sparse sharing framework, and compared with single-task models and three typical multi-task learning baselines, the proposed approach achieves consistent improvement while requiring fewer parameters. Expand
Does syntax matter? A strong baseline for Aspect-based Sentiment Analysis with RoBERTa
TLDR
This paper compares the induced trees from PTMs and the dependency parsing trees on several popular models for the ABSA task, showing that the induced tree from fine-tuned RoBERTa (FT-RoBERTa) outperforms the parser-provided tree and reveals that the FT-RoberTa Induced Tree is more sentiment-word-oriented and could benefit theABSA task. Expand
CoLAKE: Contextualized Language and Knowledge Embedding
TLDR
The Contextualized Language and Knowledge Embedding (CoLAKE) is proposed, which jointly learns contextualized representation for both language and knowledge with the extended MLM objective, and achieves surprisingly high performance on a synthetic task called word-knowledge graph completion, which shows the superiority of simultaneously contextualizing language andknowledge representation. Expand
Accelerating BERT Inference for Sequence Labeling via Early-Exit
TLDR
The extensive experiments on three popular sequence labeling tasks show that the proposed SENTEE approach can save up to 66%∼75% inference cost with minimal performance degradation, and can achieve better performance under the same speed-up ratios of 2×, 3×, and 4×. Expand
Early Exiting with Ensemble Internal Classifiers
TLDR
It is shown that a novel objective function for the training of the ensemble internal classifiers can be naturally induced from the perspective of ensemble learning and information theory and a simple voting-based strategy is proposed that can achieve better accuracy-speed trade-offs. Expand