Learning Accurate, Compact, and Interpretable Tree Annotation


We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple X-bar grammar, we learn a new grammar whose non-terminals are subsymbols of the original nontermi-nals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, higher than fully lexicalized systems.

Extracted Key Phrases

7 Figures and Tables

Showing 1-10 of 19 references

Comparison of our results with those of others. human-interpretable. It shows most of the manually introduced annotations discussed by

  • 2003

Automatic word sense discrimination

  • H Schuetze
  • 1998
1 Excerpt
Showing 1-10 of 585 extracted citations
Citations per Year

824 Citations

Semantic Scholar estimates that this publication has received between 736 and 928 citations based on the available data.

See our FAQ for additional information.