A Statistical Approach to Machine Translation


In this paper, we present a statistical approach to machine translation. We describe the application of our approach to translation from French to English and give preliminary results. The field of machine translation is almost as old as the modern digital computer. In 1949 Warren Weaver suggested that the problem be attacked with statistical methods and ideas from information theory, an area which he, Claude Shannon, and others were developing at the time (Weaver 1949). Although researchers quickly abandoned this approach, advancing numerous theoretical objections, we believe that the true obstacles lay in the relative impotence of the available computers and the dearth of machine-readable text from which to gather the statistics vital to such an attack. Today, computers are five orders of magnitude faster than they were in 1950 and have hundreds of millions of bytes of storage. Large, machine-readable corpora are readily available. Statistical methods have proven their value in automatic speech recognition (Bahl et al. 1983) and have recently been applied to lexicography Sharman et al. 1988). We feel that it is time to give them a chance in machine translation. The job of a translator is to render in one language the meaning expressed by a passage of text in another language. This task is not always straightforward. For example , the translation of a word may depend on words quite far from it. Some English translators of Proust's seven volume work A la Recherche du Temps Perdu have striven to make the first word of the first volume the same as the last word of the last volume because the French original begins and ends with the same word (Bernstein 1988). Thus, in its most highly developed form, translation involves a careful study of the original text and may even encompass a detailed analysis of the author's life and circumstances. We, of course, do not hope to reach these pinnacles of the translator's art. In this paper, we consider only the translation of individual sentences. Usually, there are many acceptable translations of a particular sentence, the choice among them being largely a matter of taste. We take the view that every sentence in one language is a possible translation of any sentence in the other. We assign to every pair of sentences (S, T) a probability, Pr(TIS), to be interpreted as the probability that a translator will produce T in the target language when presented …

