Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

@inproceedings{Zhou2022MaskedPM,
  title={Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?},
  author={Xiaoping Zhou and Shiyue Zhang and Mohit Bansal},
  booktitle={NAACL},
  year={2022}
}
Previous Part-Of-Speech (POS) induction models usually assume certain independence assumptions (e.g., Markov, unidirectional, local dependency) that do not hold in real languages. For example, the subject-verb agreement can be both long-term and bidirectional. To facilitate flexible dependency modeling, we propose a Masked Part-of-Speech Model (MPoSM), inspired by the recent success of Masked Language Models (MLM). MPoSM can model arbitrary tag dependency and perform POS induction through the… 

References

SHOWING 1-10 OF 49 REFERENCES

Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

TLDR
A novel approach for inducing unsupervised part-of-speech taggers for languages that have no labeled training data, but have translated text in a resource-rich language, using graph-based label propagation for cross-lingual knowledge transfer.

Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models

TLDR
This work tackles unsupervised part-of-speech (POS) tagging by learning hidden Markov models (HMMs) that are particularly well-suited for the problem and extends the non-negative matrix factorization framework of Arora et al. (2013) to design a consistent estimator for anchor HMMs.

Deep Clustering of Text Representations for Supervision-Free Probing of Syntax

TLDR
This work explores deep clustering of multilingual text representations for unsupervised model interpretation and induction of syntax and finds that Multilingual BERT (mBERT) contains surprising amount of syntactic knowledge of English; possibly even as much as English Bert (E-BERT).

Unsupervised Learning of Syntactic Structure with Invertible Neural Projections

TLDR
A novel generative model is proposed that jointly learns discrete syntactic structure and continuous word representations in an unsupervised fashion by cascading an invertible neural network with a structured generative prior so long as the prior is well-behaved.

Two Decades of Unsupervised POS Induction: How Far Have We Come?

TLDR
It is shown that some of the oldest (and simplest) systems stand up surprisingly well against more recent approaches, and the idea of evaluating systems based on their ability to produce cluster prototypes that are useful as input to a prototype-driven learner is introduced.

Unsupervised Multilingual Learning for POS Tagging

TLDR
A hierarchical Bayesian model is formulated for jointly predicting bilingual streams of part-of-speech tags that learns language-specific features while capturing cross-lingual patterns in tag distribution for aligned words.

Mutual Information Maximization for Simple and Accurate Part-Of-Speech Induction

TLDR
This work addresses part-of-speech (POS) induction by maximizing the mutual information between the induced label and its context and shows that the variational lower bound is robust whereas the generalized Brown objective is vulnerable.

On the Role of Supervision in Unsupervised Constituency Parsing

We analyze several recent unsupervised constituency parsing models, which are tuned with respect to the parsing $F_1$ score on the Wall Street Journal (WSJ) development set (1,700 sentences). We

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

TLDR
This work proposes a new parsing framework that can jointly generate a constituency tree and dependency graph and integrates the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism.

Why Doesn’t EM Find Good HMM POS-Taggers?

TLDR
This paper investigates why the HMMs estimated by Expectation-Maximization produce such poor results as Part-of-Speech (POS) taggers and investigates Gibbs Sampling and Variational Bayes estimators and shows that VB converges faster than GS for this task and that Vb significantly improves 1-to-1 tagging accuracy over EM.