• Corpus ID: 1842

Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons

@article{Melamed1995AutomaticEA,
  title={Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons},
  author={I. Dan Melamed},
  journal={ArXiv},
  year={1995},
  volume={cmp-lg/9505044}
}
This paper shows how to induce an N-best translation lexicon from a bilingual text corpus using statistical properties of the corpus together with four external knowledge sources. The knowledge sources are cast as filters, so that any subset of them can be cascaded in a uniform framework. A new objective evaluation measure is used to compare the quality of lexicons induced with different filter cascades. The best filter cascades improve lexicon quality by up to 137% over the plain vanilla… 
Using SVMs for Filtering Translation Tables for Parallel Corpora Alignment
Translation Lexicons are known to improve the quality of parallel corpora alignment at sub-sentence granularity, the quality of newly extracted translations, and as a consequence, Machine Translation
A Comparative Study on Translation Units for Bilingual Lexicon Extraction
TLDR
Chunk-bound N-gram produces the best result in terms of accuracy as well as coverage and it improves approximately by 13% in accuracy and by 5-9% in coverage from the previously proposed baseline model.
Minimally supervised techniques for bilingual lexicon extraction
TLDR
This thesis highlights the study of a series of novel techniques for learning a translation lexicon based on a minimally-supervised, context-based approach, suggesting that words that are contextually-relevant and occurring in a similar domain are potentially useful.
Automatic construction of clean broad-coverage translation lexicons
TLDR
This lexicon cleaning technique can produce translation lexicons with recall and precision both exceeding 90\%, as well as dictionary-sized translation lexicon that are over 99\% correct.
Bilingual Lexicon Induction: Effortless Evaluation of Word Alignment Tools and Production of Resources for Improbable Language Pairs
TLDR
This paper presents a simple protocol to evaluate word aligners on bilingual lexicon induction tasks from parallel corpora, and selects the most appropriate one to produce bilingual lexicons for all language pairs of this corpus.
A Generalized Constraint Approach to Bilingual Dictionary Induction for Low-Resource Language Families
TLDR
The results show that the proposed constraint-based bilingual lexicon induction for closely related languages by extending constraints from the recent pivot-based induction technique and further enabling multiple symmetry assumption cycle to reach many more cognates in the transgraph demonstrates the potential to complement other bilingual dictionary creation methods.
Pivot-Based Bilingual Dictionary Creation for Low-Resource Languages
TLDR
This thesis proposes an automatic method of creating bilingual dictionaries for intra-family languages by taking advantage of the limited amount of existing language resources which are presented in different forms such as parallel corpora.
Lexical selection for machine translation
TLDR
This research addresses the problem of lexical selection of open-class lexical items in the framework of Machine Translation, using a corpus-based approach and adopts a lexicon-free approach towards the selection of Lexical equivalents.
Augmenting Translation Lexica by Learning Generalised Translation Patterns
TLDR
An approach to automatically induce segmentation and learn bilingual morph-like terms is explored as a phase prior to the suggestion of out-of-vocabulary bilingual lexicon entries, thereby saving the time involved and progressively improving alignment and extraction quality.
Automatic construction of English/Chinese parallel corpora
TLDR
An alignment method is presented which is based on dynamic programming to identify the one-to-one Chinese and English title pairs using the longest common subsequence (LCS) to find the most reliable Chinese translation of an English word.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 18 REFERENCES
Using cognates to align sentences in bilingual corpora
TLDR
It is discussed how cognates provide for a cheap and reasonably reliable source of linguistic knowledge, and how better and more efficient results may be obtained by combining the two criteria length and "cogneteness".
But Dictionaries Are Data Too
TLDR
An algorithm is given for obtaining maximum likelihood estimates of the parameters of a probabilistic model from this combined data and it is shown how these parameters are affected by inclusion of the dictionary for some sample words.
Improving Chinese Tokenization With Linguistic Filters On Statistical Lexical Acquisition
TLDR
Empirical evidence is presented for four points concerning tokenization of Chinese text: more rigorous "blind" evaluation methodology is needed to avoid inflated accuracy measurements, the extent of the unknown-word problem is far more serious than generally thought, and the nk-blind method is introduced.
A Program for Aligning Sentences in Bilingual Corpora
TLDR
This paper will describe a method and a program for aligning sentences based on a simple statistical model of character lengths, which uses the fact that longer sentences in one language tend to be translated into longer sentence in the other language, and that shorter sentences tend to been translated into shorter sentences.
Identifying Word Correspondences in Parallel Texts
TLDR
Researchers in both machine translation and bilingual lexicography have recently become interested in studying parallel texts, bodies of text such as the Canadian Hansards which are available in multiple languages (such as French and English).
Identifying word correspondence in parallel texts
TLDR
Researchers in both machine translation and bilingual lexicography have recently become interested in studying parallel texts, bodies of text which are available in multiple languages and who outline a self-organizing method for using these parallel texts to build a machine translation system.
The Mathematics of Statistical Machine Translation: Parameter Estimation
TLDR
It is reasonable to argue that word-by-word alignments are inherent in any sufficiently large bilingual corpus, given a set of pairs of sentences that are translations of one another.
A Simple Rule-Based Part of Speech Tagger
TLDR
This work presents a simple rule-based part of speech tagger which automatically acquires its rules and tags with accuracy comparable to stochastic taggers, demonstrating that the stochastics method is not the only viable method for part ofspeech tagging.
Evaluation of Machine Translation
This paper reports results of the 1992 Evaluation of machine translation (MT) systems in the DARPA MT initiative and results of a Pre-test to the 1993 Evaluation. The DARPA initiative is unique in
Using Bi-textual Alignment for Translation Validation: the TransCheck system
We describe the first prototype version of TransCheck, a system for automatically detecting certain types of translation errors that is based on the notion of bi-text, or aligned corpora of
...
1
2
...