The Penn Chinese TreeBank: Phrase structure annotation of a large corpus

  title={The Penn Chinese TreeBank: Phrase structure annotation of a large corpus},
  author={Naiwen Xue and F. Xia and F. Chiou and Martha Palmer},
  journal={Nat. Lang. Eng.},
  • Naiwen Xue, F. Xia, +1 author Martha Palmer
  • Published 2005
  • Computer Science
  • Nat. Lang. Eng.
  • With growing interest in Chinese Language Processing, numerous NLP tools (e.g., word segmenters, part-of-speech taggers, and parsers) for Chinese have been developed all over the world. However, since no large-scale bracketed corpora are available to the public, these tools are trained on corpora with different segmentation criteria, part-of-speech tagsets and bracketing guidelines, and therefore, comparisons are difficult. As a first step towards addressing this issue, we have been preparing a… CONTINUE READING
    Towards Accurate and Efficient Chinese Part-of-Speech Tagging
    • 6
    • Highly Influenced
    • PDF
    OntoNotes : A Large Training Corpus for Enhanced Processing
    • 64
    • PDF
    Chinese Statistical Parsing
    • 12
    • PDF
    Statistical parsing of noun phrase structure
    • 8
    • PDF
    Generalization of Words for Chinese Dependency Parsing
    • 7
    • Highly Influenced
    The NAIST-NTT TED Talk Treebank
    • 7
    • PDF
    Adding semantic roles to the Chinese Treebank
    • 101
    Data-oriented parsing and the Penn Chinese treebank
    • 11
    • PDF
    Optimizing Chinese Word Segmentation for Machine Translation Performance
    • 309
    • PDF


    Publications referenced by this paper.
    The Prague Dependency Treebank
    • 414
    • PDF
    Automatic annotation of the Penn-treebank with LFG f-structureinformation
    • 63
    • PDF
    A Stochastic Finite-State Word-Segmentation Algorithm for Chinese
    • 246
    • Highly Influential
    • PDF
    From TreeBank to PropBank
    • 634
    • PDF
    Discriminative Reranking for Natural Language Parsing
    • 754
    • PDF
    A Maximum-Entropy-Inspired Parser
    • 1,798
    • PDF