Unsupervised Multilingual Learning for POS Tagging

@inproceedings{Snyder2008UnsupervisedML,
  title={Unsupervised Multilingual Learning for POS Tagging},
  author={Benjamin Snyder and Tahira Naseem and Jacob Eisenstein and R. Barzilay},
  booktitle={EMNLP},
  year={2008}
}
We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The key hypothesis of multilingual learning is that by combining cues from multiple languages, the structure of each becomes more apparent. We formulate a hierarchical Bayesian model for jointly predicting bilingual streams of part-of-speech tags. The model learns language-specific features while capturing cross-lingual patterns in tag distribution for aligned words. Once the parameters of our… Expand
Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches
TLDR
This work considers two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second model which instead incorporates multilingual context using latent variables. Expand
Unsupervised Multilingual Grammar Induction
TLDR
A generative Bayesian model is formulated which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters, and loosely binds parallel trees while allowing language-specific syntactic structure. Expand
Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: a Bayesian Non-Parametric Approach
TLDR
A non-parametric Bayesian model is proposed that connects related tagging decisions across languages through the use of multilingual latent variables and shows that performance improves steadily as the number of languages increases. Expand
Crosslingual Induction of Semantic Roles
TLDR
This work considers unsupervised induction of semantic roles from sentences annotated with automatically-predicted syntactic dependency representations and uses a state-of-the-art generative Bayesian non-parametric model to do so. Expand
Wiki-ly Supervised Part-of-Speech Tagging
TLDR
This paper shows that it is possible to build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary. Expand
Simpler unsupervised POS tagging with bilingual projections
TLDR
An unsupervised approach to part-of-speech tagging based on projections of tags in a word-aligned bilingual parallel corpus is presented, which automatically identifies “good” training sentences from the parallel corpus and applies self-training. Expand
Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings
TLDR
It is demonstrated that accurate multilingual partof-speech (POS) tagging can be done with just a few (e.g., ten) word translation pairs, and the generated tags are used to predict typological properties of languages, obtaining a 50% error reduction relative to the prototype model. Expand
Part-of-Speech Tag Disambiguation by Cross-Linguistic Majority Vote
TLDR
This paper improves the Macedonian training set for supervised part-ofspeech tagging by transferring available manual annotations from a number of similar languages, based on multilingual parallel corpora, automatic word alignment, and a set of rules. Expand
Unsupervised multilingual learning
TLDR
A class of probabilistic models that exploit deep links among human languages as a form of naturally occurring supervision allow us to substantially improve performance for core text processing tasks, such as morphological segmentation, part-of-speech tagging, and syntactic parsing. Expand
Increasing the Quality and Quantity of Source Language Data for Unsupervised Cross-Lingual POS Tagging
TLDR
This paper considers the task of unsupervised cross-lingual POS tagging, and constructs a model that predicts the best source language for a given target language, and shows that this model improves on using a single fixed source language. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 29 REFERENCES
Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora
This paper investigates the potential for projecting linguistic annotations including part-of-speech tags and base noun phrase bracketings from one language to another via automatically word-alignedExpand
Unsupervised Multilingual Learning for Morphological Segmentation
TLDR
A nonparametric Bayesian model is presented that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morphem patterns, or abstract morphemes, of multiple languages. Expand
Part-of-Speech Tagging in Context
TLDR
A new HMM tagger is presented that exploits context on both sides of a word to be tagged, and it is shown how this new tagger achieves state-of-the-art results in a supervised, non-training intensive framework. Expand
A Backoff Model for Bootstrapping Resources for Non-English Languages
TLDR
This paper proposes a novel approach of combining a bootstrapped resource with a small amount of manually annotated data and shows that this approach achieves a significant improvement over EM and self-training and systems that are only trained on manual annotations. Expand
A fully Bayesian approach to unsupervised part-of-speech tagging
TLDR
This model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a state-of-the-art discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. Expand
Prototype-Driven Learning for Sequence Models
TLDR
This work investigates prototype-driven learning for primarily unsupervised sequence modeling, where prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label, then propagated across a corpus using distributional similarity features in a log-linear generative model. Expand
Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora
TLDR
Noise-robust tagger, bracketer and lemmatizer training procedures capable of accurate system bootstrapping from noisy and incomplete initial projections are presented, which significantly exceeds that obtained by direct annotation projection. Expand
An Unsupervised Method for Word Sense Tagging using Parallel Corpora
TLDR
An unsupervised method for word sense disambiguation that exploits translation correspondences in parallel corpora is presented, using pseudo-translations, created by machine translation systems, in order to make possible the evaluation of the approach against a standard test set. Expand
A Bayesian LDA-based model for semi-supervised part-of-speech tagging
TLDR
A novel Bayesian model for semi-supervised part-of-speech tagging that outperforms the best previously proposed model for this task on a standard dataset and introduces a model for determining the set of possible tags of a word which captures important dependencies in the ambiguity classes of words. Expand
Inducing a Multilingual Dictionary from a Parallel Multitext in Related Languages
TLDR
This work builds a multilingual dictionary induction system for a family of related resource-poor languages that assumes only the presence of a single medium-length multitext (the Bible). Expand
...
1
2
3
...