Learn More
This paper introduces our SAU-KERC system that achieved F1 score of 0.39 in the world-level quality estimation task in WMT2015. The goal is to assign each translated word a “OK” or “BAD” label indicating translation quality. We adopt the sequence labeling model, conditional random fields (CRF), to predict the labels. Since “BAD” labels are rare in the(More)
The patents cover almost all the latest, the most active innovative technical information in technical fields, therefore patent classification has great application value in the patent research domain. This paper presents a KNN text categorization method based on shared nearest neighbor, effectively combining the BM25 similarity calculation method and the(More)
Latent Semantic Analysis (LSA) is a technology which is used to analyze the latent concepts. LSA is based on the Vector Space Model (VSM) and statistics, and it usually takes the Singular Value Decomposition (SVD) as the kernel algorithm. Always, LSA increases the scale of the training data to improve system performance. However, as it needs many extra(More)
The technology of topic tracking can help people find what they are interested from the vast information sea. Since topics develop dynamically, topic excursion problem may appear in the tracking process. To overcome this problem and the shortcomings of current adaptive methods, we propose a new adaptive method for topic tracking. We call it time adaptive(More)
A patent includes a great deal of practical technical information, and plays an important role in promoting scientific development. The research on patent classification and retrieval has significant application value. A patent is a special technical text with strict hierarchical classification system and normalized structure, and there are a number of(More)
In order to solve the problem of Katakana reduced to English in Japanese-English translation, we employ the phrase-based statistical machine translation model to perform Katakana phrase (or word) translation from Japanese to English. The katakana phrase is segmented into words by CRF, and then Japanese-English and English-Japanese bi-directional integration(More)
This paper proposes a method of NP tree matching to realize the translation of English-Chinese patent titles. Firstly a bilingual example database for patent titles is built. English parse trees are produced by English parser, forming NP tree database. The input patent title to be translated is firstly parsed into a tree. Then NP trees are searched for(More)
The bilingual resources are indispensable and vital resources in the NPL fields, such as machine translation, etc. A large amount of electronic information is embedded in the Internet, which can be used as a potential information source of large-scale multi-language corpus, so it is a potential and feasible way to mine a great capacity of true bilingual(More)
In this paper, a Chinese dependency parsing method is proposed based on improved Maximum Spanning Tree (MST) Parser. Within this method, dependency direction discrimination model and head POS recognition model are used to modify the weights of directed edges in the MST model, and then the Eisner algorithm is used to search and generate the dependency trees.(More)
Latent Semantic Indexing (LSI) is an effective method in the way of feature extraction, which has been applied to many text learning tasks, such as text clustering and information retrieval. This paper thoroughly analyses the influence of term co-occurrences on the mapping of Latent Semantic Indexing and brings forward a method named pseudo document which(More)
  • 1