Learn More
Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domain-dependent and there are many Katakana neologisms, it is almost impossible to construct and maintain Katakana word dictionary by hand. This paper proposes an automatic segmentation method of Japanese Katakana(More)
Unknown words and word segmentation granularity are two main problems in Chinese word segmentation for Chinese-Japanese Machine Translation (MT). In this paper, we propose an approach of exploiting common Chinese characters shared between Chinese and Japanese in Chinese word segmentation optimization for MT aiming to solve these problems. We augment the(More)
This paper presents the results of the 1st workshop on Asian translation (WMT2014) shared tasks, which included J↔E translation subtasks and J↔C translation subtasks. As the first year of WAT, 12 institutions participated to the shared tasks. More than 300 translation results have been submitted to the automatic evaluation server, and selected submissions(More)
This paper presents the results of the shared tasks from the 2nd workshop on Asian translation (WAT2015) including J↔E, J↔C scientific paper translation subtasks and C→J, K→J patent translation subtasks. For the WAT2015, 12 institutions participated in the shared tasks. About 500 translation results have been submitted to the automatic evaluation server,(More)
We present a high-precision, language-independent transliteration framework applicable to bilingual lexicon extraction. Our approach is to employ a bilingual topic model to enhance the output of a state-of-the-art grapheme-based transliteration baseline. We demonstrate that this method is able to extract a high-quality bilingual lexicon from a comparable(More)
Word sequential alignment models work well for similar language pairs, but they are quite inadequate for distant language pairs. It is difficult to align words or phrases of distant languages with high accuracy without structural information of the sentences. In this paper, we propose a Bayesian subtree alignment model that incorporates dependency relations(More)
In the literature, two main categories of methods have been proposed for bilingual lexicon extraction from comparable corpora, namely topic model and context based methods. In this paper, we present a bilingual lexicon extraction system that is based on a novel combination of these two methods in an iterative process. Our system does not rely on any prior(More)
One of the main issues in a word alignment task is the difficulty of handling function words that do not have direct translations which we call unique function words. They are often aligned to some words in the other language incorrectly. This is prominent in language pairs with very different sentence structures. In this paper, we propose a novel approach(More)
Parallel sentences are crucial for statistical machine translation (SMT). However, they are quite scarce for most language pairs, such as Chinese–Japanese. Many studies have been conducted on extracting parallel sentences from noisy parallel or comparable corpora. We extract Chinese–Japanese parallel sentences from quasi–comparable corpora, which are(More)