• Corpus ID: 8396953

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

@inproceedings{Das2011UnsupervisedPT,
  title={Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections},
  author={Dipanjan Das and Slav Petrov},
  booktitle={ACL},
  year={2011}
}
We describe a novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language. Our method does not assume any knowledge about the target language (in particular no tagging dictionary is assumed), making it applicable to a wide array of resource-poor languages. We use graph-based label propagation for cross-lingual knowledge transfer and use the projected labels as features in an unsupervised… 

Figures and Tables from this paper

Unsupervised and Lightly Supervised Part-of-Speech Tagging Using Recurrent Neural Networks
TLDR
A novel approach to induce automatically a Part-Of-Speech (POS) tagger for resource-poor languages (languages that have no labeled training data) based on cross-language projection of linguistic annotations from parallel corpora without the use of word alignment information is proposed.
Cross-lingual part-of-speech tagging using word embedding
TLDR
The results suggest the efficacy of the approach over traditional label propagation with lexical feature for projecting part-of-speech information across languages, and show that a few of labeled data help to enhance the effect a lot in cross-lingual task.
Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios
TLDR
This work describes a fully unsupervised cross-lingual transfer approach for part-of-speech (POS) tagging under a truly low resource scenario and shows that using multi-source information, either via projection or output combination, improves the performance for most target languages.
Wiki-ly Supervised Part-of-Speech Tagging
TLDR
This paper shows that it is possible to build POS-taggers exceeding state-of-the-art bilingual methods by using simple hidden Markov models and a freely available and naturally growing resource, the Wiktionary.
Unsupervised adaptation of supervised part-of-speech taggers for closely related languages
TLDR
This work proposes to circumvent this bottleneck by training a supervised HMM tagger on a closely related language for which annotated data are available, and translating the words in the tagger parameter files into the low-resource language.
Unsupervised adaptation of supervised part-of-speech taggers for closely related languages
TLDR
This work proposes to circumvent this bottleneck by training a supervised HMM tagger on a closely related language for which annotated data are available, and translating the words in the tagger parameter files into the low-resource language.
Part-of-speech Taggers for Low-resource Languages using CCA Features
TLDR
A probability-based confidence model is developed to identify words with highly likely tag projections and use these words to train a multi-class SVM using the CCA features, which yields average performance of 85% accuracy for languages with almost no resources, outperforming a state-of-the-art partiallyobserved CRF model.
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
TLDR
It is shown that additional token constraints can be projected from a resource-rich source language to a resourceful target language via word-aligned bitext, and empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages.
Cross-Lingual Morphological Tagging for Low-Resource Languages
TLDR
This approach extends existing approaches of projecting part-of-speech tags across languages, using bitext to infer constraints on the possible tags for a given word type or token, using Wsabie, a discriminative embeddingbased model with rank-based learning.
A language-independent and fully unsupervised approach to lexicon induction and part-of-speech tagging for closely related languages
In this paper, we describe our generic approach for transferring part-of-speech annotations from a resourced language towards an etymologically closely related non-resourced language, without using
...
...

References

SHOWING 1-10 OF 29 REFERENCES
Multilingual Part-of-Speech Tagging: Two Unsupervised Approaches
TLDR
This work considers two ways of applying this intuition to the problem of unsupervised part-of-speech tagging: a model that directly merges tag structures for a pair of languages into a single sequence and a second model which instead incorporates multilingual context using latent variables.
Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to part-of-speech (POS) tagging. The algorithm uses a similarity graph to
A Universal Part-of-Speech Tagset
TLDR
This work proposes a tagset that consists of twelve universal part-of-speech categories and develops a mapping from 25 different treebank tagsets to this universal set, which when combined with the original treebank data produces a dataset consisting of common parts- of-speech for 22 different languages.
Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora
This paper investigates the potential for projecting linguistic annotations including part-of-speech tags and base noun phrase bracketings from one language to another via automatically word-aligned
Unsupervised Multilingual Grammar Induction
TLDR
A generative Bayesian model is formulated which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters, and loosely binds parallel trees while allowing language-specific syntactic structure.
A Backoff Model for Bootstrapping Resources for Non-English Languages
TLDR
This paper proposes a novel approach of combining a bootstrapped resource with a small amount of manually annotated data and shows that this approach achieves a significant improvement over EM and self-training and systems that are only trained on manual annotations.
Dependency Grammar Induction via Bitext Projection Constraints
TLDR
This work considers generative and discriminative models for dependency grammar induction that use word-level alignments and a source language parser to constrain the space of possible target trees and evaluates the approach on Bulgarian and Spanish CoNLL shared task data and shows it consistently outperform unsupervised methods and can outperform supervised learning for limited training data.
Minimized Models for Unsupervised Part-of-Speech Tagging
TLDR
A novel method is described that uses integer programming to explicitly search for the smallest model that explains the data, and then uses EM to set parameter values, and performs better than existing state-of-the-art systems in both settings.
CoNLL-X Shared Task on Multilingual Dependency Parsing
TLDR
How treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured is described and general conclusions about multi-lingual parsing are drawn.
Two Decades of Unsupervised POS Induction: How Far Have We Come?
TLDR
It is shown that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches, and the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner is introduced.
...
...