Learn More
This paper proposes a novel approach that utilizes a machine learning method to improve pivot-based statistical machine translation (SMT). For language pairs with few bilingual data, a possible solution in pivot-based SMT using another language as a "bridge" to generate source-target translation. However, one of the weaknesses is that some useful(More)
Minimum error rate training is a popular method for parameter tuning in statistical machine translation (SMT). However, the optimization objective function may change drastically at each optimization step, which may induce MERT instability. We propose an alternative tuning method based on an ultraconservative update, in which the combination of an expected(More)
In statistical machine translation, minimum error rate training (MERT) is a standard method for tuning a single weight with regard to a given development data. However, due to the diversity and uneven distribution of source sentences, there are two problems suffered by this method. First, its performance is highly dependent on the choice of a development(More)
With the rapid development of Internet and E-commerce, the quantity of product reviews on the web grows very fast, but the review quality is inconsistent. This paper addresses the problem of automatically ranking reviews. A specification for judging the reviews quality is first defined and thus ranking review is formalized as ordinal regression problem. In(More)
To overcome the scarceness of bilingual corpora for some language pairs in machine translation, pivot-based SMT uses pivot language as a "bridge" to generate source-target translation from source-pivot and pivot-target translation. One of the key issues is to estimate the probabilities for the generated phrase pairs. In this paper, we present a novel(More)
This paper addresses the issue of text nor-malization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting 'informally inputted' text into the canonical form, by eliminating 'noises' in the text and detecting paragraph and sentence boundaries in the text. Previously, text normalization issues(More)