• Publications
  • Influence
TinyBERT: Distilling BERT for Natural Language Understanding
A novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models is proposed and, by leveraging this new KD method, the plenty of knowledge encoded in a large “teacher” BERT can be effectively transferred to a small “student” TinyBERT.
ERNIE: Enhanced Language Representation with Informative Entities
This paper utilizes both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE) which can take full advantage of lexical, syntactic, and knowledge information simultaneously, and is comparable with the state-of-the-art model BERT on other common NLP tasks.
HHMM-based Chinese Lexical Analyzer ICTCLAS
This document presents the results from Inst. of Computing Tech., CAS in the ACL SIGHAN-sponsored First International Chinese Word Segmentation Bake-off. The authors introduce the unified HHMM-based
Findings of the 2017 Conference on Machine Translation (WMT17)
This paper presents the results of the WMT17 shared tasks, which included three machine translation (MT) tasks (news, biomedical, and multimodal), two evaluation tasks (metrics and run-time
Exploiting Cross-Sentence Context for Neural Machine Translation
This paper proposes a cross-sentence context-aware approach and investigates the influence of historical contextual information on the performance of neural machine translation (NMT).
Word-level Textual Adversarial Attacking as Combinatorial Optimization
A novel attack model, which incorporates the sememe-based word substitution method and particle swarm optimization-based search algorithm to solve the two problems separately is proposed, which consistently achieves much higher attack success rates and crafts more high-quality adversarial examples as compared to baseline methods.
Maximum Entropy Based Phrase Reordering Model for Statistical Machine Translation
A novel reordering model for phrase-based statistical machine translation (SMT) that uses a maximum entropy (MaxEnt) model to predicate reorderings of neighbor blocks (phrase pairs) that obtains significant improvements in BLEU score on the NIST MT-05 and IWSLT-04 tasks.
Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search
Experiments show that GBS can provide large improvements in translation quality in interactive scenarios, and that, even without any user input, it can be used to achieve significant gains in performance in domain adaptation scenarios.
基於《知網》的辭彙語義相似度計算 (Word Similarity Computing Based on How-net) [In Chinese]
The How-net definition of a word is rewritten in a more structural format, using the abstract data structure of set and feature structure, and the similarity between sememes, that between sets and feature structures are given.
Tree-to-String Alignment Template for Statistical Machine Translation
A novel translation model based on tree-to-string alignment template (TAT) which describes the alignment between a source parse tree and a target string that significantly outperforms Pharaoh, a state-of-the-art decoder for phrase-based models.