• Publications
  • Influence
A Closer Look at Skip-gram Modelling
TLDR
The amount of extra training data required to achieve skip-gram coverage using standard adjacent tri-grams is determined, which is determined by computing all possible skip-rams in a training corpus and measuring how many adjacent (standard) n- grams these cover in test documents.
Chinese Text Classification without Automatic Word Segmentation
TLDR
This paper tests the assumption that segmentation is a necessary step for authorship attribution and topic classification tasks in Chinese, and demonstrates that it is not, and shows that a naïve character bigram model of text performs as well as models generated using a state-of-the-art automatic segmenter.
Professor or Screaming Beast? Detecting Anomalous Words in Chinese
TLDR
This paper aims to develop a system that can automatically identify anomalies in Pinyin, whether they are simple typos intentional substitutions, and after identifying them, the system should suggest the correct word to be used.
Email Categorization with Tournament Methods
To perform the task of email categorization, the tournament methods are proposed in this article in which the multi-class categorization process is broken down into a set of binary classification
English Grammar Error Correction Algorithm Based on Classification Model
TLDR
The study results show that, with the continuous increase of training samples and the continuous progress of learning process, the proposed English grammar error correction algorithm based on the classification model will continue to increase its classification accuracy, further refine its recognition rules, and gradually improve correction efficiency, thereby reducing processing time, saving storage space, and streamlining processing flow.
Text Classification with Tournament Methods
TLDR
The use of binary classifiers in both Round Robin and Elimination tournaments are described, and both tournament methods and n-way classification when determining the language of origin of speakers (both native and non-native English speakers) speaking English are compared.
Chinese Pinyin-Text Conversion on Segmented Text
TLDR
This paper compares the two models and comes to a conclusion that using word-based bi-gram language model achieve higher conversion accuracy than character- based bi- gram language model.
Text Classification Method with Combination of Fuzzy Relation and Feature Distribution Variance
TLDR
It is proved that the text classification method based on FRFDV is feasible and the accuracy of the results is higher by 2 % and 4 % respectively.
Research on Text Error Correction Algorithm after Automatic Speech Recognition Based on Pragmatic Information
TLDR
The proposed combined algorithm and optimization method solved the text error problems of traditional error correction methods that cannot understand semantics and sentence meanings well and the accuracy increased 25% for the telephone inquiry by text error correction.
...
1
2
...