Transliteration Generation and Mining with Limited Training Resources
@inproceedings{Jiampojamarn2010TransliterationGA, title={Transliteration Generation and Mining with Limited Training Resources}, author={Sittichai Jiampojamarn and Kenneth Dwyer and Shane Bergsma and Aditya Bhargava and Qing Dou and Mi-Young Kim and Grzegorz Kondrak}, booktitle={NEWS@ACL}, year={2010} }
We present DirecTL+: an online discriminative sequence prediction model based on many-to-many alignments, which is further augmented by the incorporation of joint n-gram features. Experimental results show improvement over the results achieved by DirecTL in 2009. We also explore a number of diverse resource-free and language-independent approaches to transliteration mining, which range from simple to sophisticated.
44 Citations
A Statistical Model for Unsupervised and Semi-supervised Transliteration Mining
- Computer ScienceACL
- 2012
A novel model is efficient, language pair independent and mines transliteration pairs in a consistent fashion in both unsupervised and semi-supervised settings to automatically extract transliterations pairs from parallel corpora.
Robust Transliteration Mining from Comparable Corpora with Bilingual Topic Models
- Computer ScienceIJCNLP
- 2013
It is demonstrated that this method is able to extract a high-quality bilingual lexicon from a comparable corpus, and the topic model is extended to propose a solution to the out-of-domain problem.
Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages
- Computer ScienceEMNLP
- 2018
This work presents a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which it is shown can be sourced from annotators in a matter of hours.
Leveraging supplementary transcriptions and transliterations via re-ranking
- Computer Science
- 2011
This thesis presents a unified method for leveraging related transliterations or transcription data to improve the performance of a base G2P or machine transliteration system.
Low-Resource G2P and P2G Conversion with Synthetic Training Data
- Computer ScienceSIGMORPHON
- 2020
A method for synthesizing training data using a combination of diverse models is proposed and experiment with three transduction approaches in both standard and low-resource settings, as well as on the related task of phoneme-to-grapheme conversion.
Leveraging supplemental representations for sequential transduction
- Computer ScienceNAACL
- 2012
A unified reranking approach is applied to both grapheme-to-phoneme conversion and machine transliteration demonstrating substantial accuracy improvements by utilizing heterogeneous transliterations and transcriptions of the input word.
A Comparative Study of Extremely Low-Resource Transliteration of the World's Languages
- Computer ScienceLREC
- 2018
In this extremely low-resource task of transliterating around 1000 Bible names from 591 languages into English, it is found that a phrase-based MT system performs much better than other methods, including a g2p system and a neural MT system.
Comparison of Assorted Models for Transliteration
- Psychology, Computer ScienceNEWS@ACL
- 2018
A combination of discriminative, generative, and neural models obtains the best results on the development sets of three neural MT models used in the NEWS 2018 Shared Task on Transliteration.
A Bayesian Alignment Approach to Transliteration Mining
- Computer ScienceTALIP
- 2013
In this article we present a technique for mining transliteration pairs using a set of simple features derived from a many-to-many bilingual forced-alignment at the grapheme level to classify…
Leveraging Transliterations from Multiple Languages
- Computer ScienceNEWS@IJCNLP
- 2011
This paper proposes a re-ranking method with features based on n-gram alignments as well as system and alignment scores that achieves a relative improvement of over 10% over the base system used on its own and to system combination.
References
SHOWING 1-10 OF 18 REFERENCES
DirecTL: a Language Independent Approach to Transliteration
- Computer ScienceNEWS@IJCNLP
- 2009
DirecTL is an online discriminative sequence prediction model that employs a many-to-many alignment between target and source and is able to independently discover many of the language-specific regularities in the training data.
Transliteration as Constrained Optimization
- Computer ScienceEMNLP
- 2008
It is shown that the transliteration problem can be formulated as a constrained optimization problem and thus take into account contextual dependencies and constraints among character bi-grams in the two strings.
Weakly Supervised Named Entity Transliteration and Discovery from Multilingual Comparable Corpora
- Computer ScienceACL
- 2006
An (almost) unsupervised learning algorithm for automatic discovery of Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly temporally aligned with a resource rich language.
Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion
- Computer ScienceACL
- 2008
The key idea is online discriminative training, which updates parameters according to a comparison of the current system output to the desired output, allowing the model to train all of its components together.
Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion
- Computer ScienceNAACL
- 2007
This work presents a novel technique of training with many-to-many alignments of letters and phonemes, and applies an HMM method in conjunction with a local classification model to predict a global phoneme sequence given a word.
Alignment-Based Discriminative String Similarity
- Computer ScienceACL
- 2007
This work proposes an alignment-based discriminative framework for string similarity that achieves exceptional performance; on nine separate cognate identication experiments using six language pairs, it more than double the precision of traditional orthographic measures like Longest Common Subsequence Ratio and Dice’s Coefcient.
A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance
- Computer ScienceUAI
- 2005
This paper presents discriminative string-edit CRFs, a finite-state conditional random field model for edit sequences between strings, trained on both positive and negative instances of string pairs.
Models of translation equivalence among words
- Computer ScienceCL
- 2000
This article presents methods for biasing statistical translation models to reflect bitext properties, and shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs.
Learning String-Edit Distance
- Computer ScienceIEEE Trans. Pattern Anal. Mach. Intell.
- 1998
The stochastic model allows us to learn a string-edit distance function from a corpus of examples and is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.
Adaptive duplicate detection using learnable string similarity measures
- Computer ScienceKDD '03
- 2003
This paper proposes to employ learnable text distance functions for each database field, and shows that such measures are capable of adapting to the specific notion of similarity that is appropriate for the field's domain.