Learn More
In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain. ASPEC was constructed in the Japanese-Chinese machine translation project conducted between 2006 and 2010 using the Special Coordination Funds for Promoting Science and Technology. It consists(More)
This paper presents the results of the 1st workshop on Asian translation (WMT2014) shared tasks, which included J↔E translation subtasks and J↔C translation subtasks. As the first year of WAT, 12 institutions participated to the shared tasks. More than 300 translation results have been submitted to the automatic evaluation server, and selected submissions(More)
Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana(More)
This paper presents the results of the shared tasks from the 2nd workshop on Asian translation (WAT2015) including J↔E, J↔C scientific paper translation subtasks and C→J, K→J patent translation subtasks. For the WAT2015, 12 institutions participated in the shared tasks. About 500 translation results have been submitted to the automatic evaluation server,(More)
Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for Chinese-Japanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the(More)
Word sequential alignment models work well for similar language pairs, but they are quite inadequate for distant language pairs. It is difficult to align words or phrases of distant languages with high accuracy without structural information of the sentences. In this paper, we propose a Bayesian subtree alignment model that incorporates dependency relations(More)
We present a high-precision, language-independent transliteration framework applicable to bilingual lexicon extraction. Our approach is to employ a bilingual topic model to enhance the output of a state-of-the-art grapheme-based transliteration baseline. We demonstrate that this method is able to extract a high-quality bilingual lexicon from a comparable(More)
In the literature, two main categories of methods have been proposed for bilingual lexicon extraction from comparable corpora, namely topic model and context based methods. In this paper, we present a bilingual lexicon extraction system that is based on a novel combination of these two methods in an iterative process. Our system does not rely on any prior(More)
One of the main issues in a word alignment task is the difficulty of handling function words that do not have direct translations which we call unique function words. They are often aligned to some words in the other language incorrectly. This is prominent in language pairs with very different sentence structures. In this paper, we propose a novel approach(More)