Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus

Abstract

Example-based parsing has already been proposed in literature. In particular, attempts are being made to develop techniques for language pairs where the source and target languages are different, e.g. Direct Projection Algorithm (Hwa et al., 2005). This enables one to develop parsed corpus for target languages having fewer linguistic tools with the help of a resourcerich source language. The DPA algorithm works on the assumption of Direct Correspondence which simply means that the relation between two words of the source language sentence can be projected directly between the corresponding words of the parallel target language sentence. However, we find that this assumption does not hold good all the time. This leads to wrong parsed structure of the target language sentence. As a solution we propose an algorithm called pseudo DPA (pDPA) that can work even if Direct Correspondence assumption is not guaranteed. The proposed algorithm works in a recursive manner by considering the embedded phrase structures from outermost level to the innermost. The present work discusses the pDPA algorithm, and illustrates it with respect to English-Hindi language pair. Link Grammar based parsing has been considered as the underlying parsing scheme for this work.

Extracted Key Phrases

6 Figures and Tables

Cite this paper

@inproceedings{Goyal2006ParsingAP, title={Parsing Aligned Parallel Corpus by Projecting Syntactic Relations from Annotated Source Corpus}, author={Shailly Goyal and Niladri Chatterjee}, booktitle={ACL}, year={2006} }