• Publications
  • Influence
Pre-Training with Whole Word Masking for Chinese BERT
TLDR
This technical report adapt whole word masking in Chinese text, that masking the whole word instead of masking Chinese characters, which could bring another challenge in Masked Language Model (MLM) pre-training task. Expand
Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency
TLDR
A new word replacement order determined by both the wordsaliency and the classification probability is introduced, and a greedy algorithm called probability weighted word saliency (PWWS) is proposed for text adversarial attack. Expand
LTP: A Chinese Language Technology Platform
TLDR
LTP (Language Technology Platform) is an integrated Chinese processing platform which includes a suite of high performance natural language processing modules and relevant corpora that achieved good results in some relevant evaluations, such as CoNLL and SemEval. Expand
Learning Semantic Hierarchies via Word Embeddings
TLDR
This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings, which can be used to measure the semantic relationship between words. Expand
Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation
TLDR
This paper describes the system (HIT-SCIR) submitted to the CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies, which was ranked first according to LAS and outperformed the other systems by a large margin. Expand
A Span-Extraction Dataset for Chinese Machine Reading Comprehension
TLDR
This paper introduces a Span-Extraction dataset for Chinese machine reading comprehension to add language diversities in this area and hosted the Second Evaluation Workshop on Chinese Machine Reading Comprehension (CMRC 2018). Expand
Cross-lingual Dependency Parsing Based on Distributed Representations
TLDR
This paper provides two algorithms for inducing cross-lingual distributed representations of words, which map vocabularies from two different languages into a common vector space and bridges the lexical feature gap by using distributed feature representations and their composition. Expand
Revisiting Embedding Features for Simple Semi-supervised Learning
TLDR
Experiments on the task of named entity recognition show that each of the proposed approaches can better utilize the word embedding features, among which the distributional prototype approach performs the best. Expand
A Representation Learning Framework for Multi-Source Transfer Parsing
TLDR
This work presents a novel representation learning framework that allows multi-source transfer parsing using full lexical features straightforwardly and significantly outperform the state-of-the-art transfer system proposed most recently. Expand
Character-Level Chinese Dependency Parsing
TLDR
This paper presents novel adaptations of two major shift-reduce dependency parsing algorithms to character-level parsing, and demonstrates improved performances over word-based parsing methods on the Chinese Treebank. Expand
...
1
2
3
4
5
...