Hierarchical clustering of word class distributions


We propose an unsupervised approach to POS tagging where first we associate each word type with a probability distribution over word classes using Latent Dirichlet Allocation. Then we create a hierarchical clustering of the word types: we use an agglomerative clustering algorithm where the distance between clusters is defined as the JensenShannon divergence between the probability distributions over classes associated with each word-type. When assigning POS tags, we find the tree leaf most similar to the current word and use the prefix of the path leading to this leaf as the tag. This simple labeler outperforms a baseline based on Brown clusters on 9 out of 10 datasets.

Extracted Key Phrases

4 Figures and Tables

Cite this paper

@inproceedings{Chrupala2012HierarchicalCO, title={Hierarchical clustering of word class distributions}, author={Grzegorz Chrupala}, year={2012} }