• Corpus ID: 15195083

A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives

@article{Sim2014AMA,
  title={A Morphological Analyzer for Japanese Nouns, Verbs and Adjectives},
  author={Yanchuan Sim},
  journal={ArXiv},
  year={2014},
  volume={abs/1410.0291}
}
We present an open source morphological analyzer for Japanese nouns, verbs and adjectives. The system builds upon the morphological analyzing capabilities of MeCab to incorporate finer details of classification such as politeness, tense, mood and voice attributes. We implemented our analyzer in the form of a finite state transducer using the open source finite state compiler FOMA toolkit. The source code and tool is available at this https URL 
KNU-HYUNDAI’s NMT system for Scientific Paper and Patent Tasks onWAT 2019
TLDR
The transformer-based NMT system submitted by the Kangwon National University and HYUNDAI team to the translation tasks of the 6th workshop on Asian Translation (WAT 2019) performed well in both the tasks and were ranked first in terms of the BLEU scores in all the JPC2 subtasks the authors participated in.
Mining Twitter Data for Landslide Events Reported Worldwide
TLDR
Multilingual support is based on the first unified cross-lingual dataset of word vectors for representing texts in multiple languages and outperforms the “native” and “translated” approaches based on monolingual word vectors.

References

SHOWING 1-8 OF 8 REFERENCES
JMdict: a Japanese-Multilingual Dictionary
The JMdict project has at its aim the compilation of a multilingual lexical database with Japanese as the pivot language. Using an XML structure designed to cater for a mix of languages and a rich
Applying Conditional Random Fields to Japanese Morphological Analysis
TLDR
This paper shows how CRFs can be applied to situations where word boundary ambiguity exists, and confirms that CRFs offer a solution to the long-standing problems in corpus-based or statistical Japanese morphological analysis.
Compilation of a multilingual parallel corpus
TLDR
A new method based on the extraction of data from Japanese−English bilingual newspaper articles and broadcast media news reports published on the WWW is proposed.
Foma: a Finite-State Compiler and Library
TLDR
Foma is a compiler, programming language, and C library for constructing finite-state automata and transducers for various uses and embraces Unicode fully and supports various different formats for specifying regular expressions.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
TLDR
This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
User’s Guide for the JUMAN System, a User-Extensible Morphological Analyzer for Japanese
  • 1991
Yuji Matsumoto , Sadao Kurohashi , Yutaka Nyoki , Hitoshi Shinho , and Makoto Nagao
  • User ’ s Guide for the JUMAN System , a User - Extensible Morphological Analyzer for Japanese
  • 1991