We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach can significantly advance the state-of-the-art parsing accuracy on two widely used target treebanks… CONTINUE READING
Figure 4: Most frequent transformation patterns (TPs) when using CDT as the source treebank and CTB5 as the target. A TP comprises two syntactic structures, one in the source side and the other in the target side, and denotes the process by which the left-side subtree is transformed into the right-side structure. Functions ψdep(.), ψsib(.), and ψgrd(.) return the specific TP type for a candidate scoring part according to the source tree d′.