A Statistical Approach to Machine Translation


The field of machine translation is almost as old as the modern digital computer. In 1949 Warren Weaver suggested that the problem be attacked with statistical methods and ideas from information theory, an area which he, Claude Shannon, and others were developing at the time (Weaver 1949). Although researchers quickly abandoned this approach, advancing numerous theoretical objections, we believe that the true obstacles lay in the relative impotence of the available computers and the dearth of machinereadable text from which to gather the statistics vital to such an attack. Today, computers are five orders of magnitude faster than they were in 1950 and have hundreds of millions of bytes of storage. Large, machine-readable corpora are readily available. Statistical methods have proven their value in automatic speech recognition (Bahl et al. 1983) and have recently been applied to lexicography (Sinclair 1985) and to natural language processing (Baker 1979; Ferguson 1980; Garside et al. 1987; Sampson 1986; Sharman et al. 1988). We feel that it is time to give them a chance in machine translation. The job of a translator is to render in one language the meaning expressed by a passage of text in another language. This task is not always straightforward. For example, the translation of a word may depend on words quite far from it. Some English translators of Proust's seven volume work A la Recherche du Temps Perdu have striven to make the first word of the first volume the same as the last word of the last volume because the French original begins and ends with the same word (Bernstein 1988). Thus, in its most highly developed form, translation involves a careful study of the original text and may even encompass a detailed analysis of the author's life and circumstances. We, of course, do not hope to reach these pinnacles of the translator's art. In this paper, we consider only the translation of individual sentences. Usually, there are many acceptable translations of a particular sentence, the choice among them being largely a matter of taste. We take the view that every sentence in one language is a possible translation of any sentence in the other. We assign to every pair of sentences (S, T) a probability, Pr (TIS) , to be interpreted as the probability that a translator will produce T in the target language when presented with S in the source language. We expect Pr (TIS) to be very small for pairs like (Le matin je me brosse les dents lPresident Lincoln was a good lawyer) and relatively large for pairs like (Le president Lincoln btait un bon avocat l President Lincoln was a good lawyer). We view the problem of machine translation then as follows. Given a sentence T in the target language, we seek the sentence S from which the translator produced T. We know that our chance of error is minimized by choosing that sentence S that is most probable given T. Thus, we wish to choose S so as to maximize Pr(SI T). Using Bayes' theorem, we can write

Extracted Key Phrases

8 Figures and Tables

Citations per Year

1,445 Citations

Semantic Scholar estimates that this publication has 1,445 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@article{Brown1990ASA, title={A Statistical Approach to Machine Translation}, author={Peter F. Brown and John Cocke and Stephen Della Pietra and Vincent J. Della Pietra and Frederick Jelinek and John D. Lafferty and Robert L. Mercer and Paul S. Roossin}, journal={Computational Linguistics}, year={1990}, volume={16}, pages={79-85} }