A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora

@inproceedings{Fung1998ASV,
  title={A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora},
  author={Pascale Fung},
  booktitle={AMTA},
  year={1998}
}
We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words in noisy parallel corpora. Using DKvec on noisy parallel corpora in English/Japanese and English… CONTINUE READING
Highly Influential
This paper has highly influenced 21 other papers. REVIEW HIGHLY INFLUENTIAL CITATIONS
Highly Cited
This paper has 259 citations. REVIEW CITATIONS

From This Paper

Figures, tables, results, and topics from this paper.

Key Quantitative Results

  • Using DKvec on noisy parallel corpora in English/Japanese and English/Chinese, our evaluations show a 55.35% precision from a small corpus and 89.93% precision from a larger corpus.
  • We show a 30% to 76% precision when top-one to top-20 translation candidates are considered.

Citations

Publications citing this paper.

259 Citations

0102030'99'03'08'13'18
Citations per Year
Semantic Scholar estimates that this publication has 259 citations based on the available data.

See our FAQ for additional information.

Similar Papers

Loading similar papers…