We present the first evaluation of the utility of automatic evaluation metrics on surface real-izations of Penn Treebank data. Using outputs of the OpenCCG and XLE realizers, along with ranked WordNet synonym substitutions, we collected a corpus of generated surface re-alizations. These outputs were then rated and post-edited by human annotators. We(More)
