Share This Author
CLUE: A Chinese Language Understanding Evaluation Benchmark
The first large-scale Chinese Language Understanding Evaluation (CLUE) benchmark is introduced, an open-ended, community-driven project that brings together 9 tasks spanning several well-established single-sentence/sentence-pair classification tasks, as well as machine reading comprehension, all on original Chinese text.
Probing Natural Language Inference Models through Semantic Fragments
This work proposes the use of semantic fragments—systematically generated datasets that each target a different semantic phenomenon—for probing, and efficiently improving, such capabilities of linguistic models.
MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity
- Hai Hu, Qi Chen, Kyle Richardson, A. Mukherjee, L. Moss, Sandra Kübler
- Computer ScienceSCIL
- 19 October 2019
It is shown that MonaLog is capable of generating large amounts of high-quality training data for BERT, improving its accuracy on SICK and used in combination with the current state-of-the-art model BERT in a variety of settings, including for compositional data augmentation.
FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark
This work introduces Chinese Few-shot Learning Evaluation Benchmark (FewCLUE), the first comprehensive small sample evaluation benchmark in Chinese, and implements a set of state-of-the-art few-shot learning methods (including PET, ADAPET, LM-BFF, P-tuning and EFL), and compares their performance with fine- Tuning and zero-shotLearning schemes on the newly constructed FewCLUE benchmark.
OCNLI: Original Chinese Natural Language Inference
- Hai Hu, Kyle Richardson, Liang Xu, Lu Li, Sandra Kübler, L. Moss
- Computer ScienceFINDINGS
- 12 October 2020
This paper presents the first large-scale NLI dataset for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI), which follows closely the annotation protocol used for MNLI, but creates new strategies for eliciting diverse hypotheses.
Investigating translated Chinese and its variants using machine learning
The results show that Chinese translations as a whole can be reliably distinguished from non-translations, even based on only five features, and typological traces from the source languages can often be found in their translations, therefore creating what the authors call dialects of translationese.
Light Pre-Trained Chinese Language Model for NLP Tasks
Preliminary results from the Free Linguistic Environment project
- D. Cavar, Lwin Moe, Hai Hu, K. Steimel
- Computer ScienceProceedings of the International Conference on…
- 16 December 2016
FLE enables various forms of probabilistic modeling of c- Structures and f-structures for input or output sentences that go beyond the capabilities of other technologies based on the LFG framework.
Building a Treebank for Chinese Literature for Translation Studies
We present a new Chinese Treebank in the literary domain, the Treebank for Chinese Literature (TCL), with an aim to foster translation studies by providing an annotated collection of Chinese texts…