• Corpus ID: 14324763

Linguistic Phenomena, Analyses, and Representations: Understanding Conversion between Treebanks

@inproceedings{Bhatt2011LinguisticPA,
  title={Linguistic Phenomena, Analyses, and Representations: Understanding Conversion between Treebanks},
  author={Rajesh Bhatt and Owen Rambow and F. Xia},
  booktitle={IJCNLP},
  year={2011}
}
Treebanks are valuable resources for natural language processing (NLP). There is much work in NLP which converts treebanks from one representation (e.g., phrase structure) to another (e.g., dependency) before applying machine learning. This paper provides a framework in which to think about the question of when such a conversion is possible. 

Figures from this paper

Creating a Tree Adjoining Grammar from a Multilayer Treebank

It is shown that the resulting TAG along with corresponding dependency structure can be used to convert a dependency treebank to a TAG-based phrase structure treebank.

Converting SynTagRus Dependency Treebank into Penn Treebank Style

The converted SynTagRus dependency structures are converted into Penn Treebank style phrase structures, whose resulting data will be used to train a statistical constituency parser for Russian and create a large-scale constituency-parsed corpus.

Converting Dependency Structure Into Persian Phrase Structure

This article proposes a method to convert a dependency structure into a phrase structure by enriching a trainable model of former hybrid strategy approach by adding a classifier to the algorithm and using postprocessing modification, and shows a reduction of error rate and quality of conversion.

Challenges in Converting between Treebanks : a Case Study from the HUTB

An important question for treebank development is whether h igh-quality conversion from one representation (e.g., dep endency structure) to another representation (e.g., phrase structure) is poss

Keeping it Simple: Generating Phrase Structure Trees from a Hindi Dependency Treebank

A conversion algorithm is proposed that converts the Hindi-Urdu Dependency Treebank to a Phrase Structure (PS) representation and generates ‘valid’ PS trees that are relatively flat with few empty categories and very close to the DS trees in their syntactic content.

Rule-Based Detection and Analysis of Annotation Errors in Dependency Treebank

The authors try to transform dependency tree into phrase structure tree, and detect annotation errors automatically based on manual rules, and can further improve treebank quality, and be applied to other dependency treebanks.

Rule-Based Detection and Analysis of Annotation Errors in Dependency Treebank

The authors try to transform dependency tree into phrase structure tree, and detect annotation errors automatically based on manual rules, and can further improve treebank quality, and be applied to other dependency treebanks.

Multi-view Chinese Treebanking

We present a multi-view annotation framework for Chinese treebanking, which uses dependency structures as the base view and supports conversion into phrase structures with minimal loss of

The Hindi/Urdu Treebank Project

The goal of Hindi/Urdu treebanking project is to build multi-layered treebanks that will provide both syntactic and semantic annotations that cover two standardized registers that are often considered separate languages: Hindi and Urdu.

References

SHOWING 1-10 OF 17 REFERENCES

Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure

A treebanking project for Hindi/Urdu is annotating dependency syntax, lexical predicate-argument structure, and phrase structure syntax in a coordinated and partly automated manner.

Converting Dependency Structures to Phrase Structures

This work not only provides ways to convert Treebanks from one type of representation to the other, but also clarifies the differences in representational coverage of the two approaches.

Towards a Multi-Representational Treebank

This paper shows that high-quality DS-to-PS conversion is possible if the conversion process is performed at the designing stage of treebank construction to ensure that all information the authors wish to represent in PS is provided in DS.

A Statistical Parser for Czech

This paper considers statistical parsing of Czech, which differs radically from English in at least two respects: (1) it is a highly inflected language, and (2) it has relatively free word order.

Automatic annotation of the Penn-treebank with LFG f-structureinformation

A new method that scales and has been applied to a complete treebank, in this case the WSJ section of Penn-II (Marcus et al, 1994), with more than 1,000,000 words in about 50,000 sentences is presented.

CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word-word dependencies, and discusses the implications of the findings for the extraction of other linguistically expressive grammars from the Treebank, and for the design of future treebanks.

Dependency Annotation Scheme for Indian Languages

The motivation for following thePaninian framework as the annotation scheme is provided and it is argued that the Paninian framework is better suited to model the various linguistic phenomena manifest in Indian languages.

Adding Semantic Annotation to the Penn TreeBank

This paper presents the basic approach to creating Proposition Bank, which involves adding a layer of semantic annotation to the Penn English TreeBank, and provides explicit guidelines for labeling all of the syntactic and semantic frames of each particular verb.

Italian Syntax: A Government-Binding Approach

The author examines the role of language in the development of Inversion and its role in the construction of theory in the case of Faire-Infinition.

Lectures on Government and Binding