Efficient Parsing with Large-Scale Unification Grammars

Abstract

The efficiency problem in parsing with large-scale unification grammars, including implementations in the Head-driven Phrase Structure grammar (HPSG) framework, used to be a serious obstacle to their application in research and commercial settings. Over the past few years, however, significant progress in efficient processing has been achieved. Still, many of the proposed techniques were developed in isolation only, making comparison and the assessment of their combined potential difficult. Also, a number of techniques were never evaluated on large-scale grammars. This thesis sets out to improve this situation by reviewing, integrating, and evaluating a number of techniques for efficient unification-based parsing. A strong focus is set on efficient graph unification. I provide an overview of previous work in this area of research, including the foundational algorithm in the work of Wroblewski (1987), for which I identify a previously unnoticed flaw, and provide a solution. I introduce the PET platform, which has been developed with two goals: (i) to serve as a flexible basis for research in efficient processing techniques, allowing precise empirical study and comparison of different approaches, and (ii) to provide an efficient run-time processor that supports fruitful scientific and practical utilization of HPSG grammars. The design and implementation of PET is presented in detail, including a closer look at efficient semi-lattice computation in the preprocessor. A number of experiments with PET are discussed, using three existing large-scale HPSG grammars of English, Japanese, and German. I give precise empirical answers to some open research questions, most importantly the question of feature structure encoding (lists of feature-value pairs versus representations based on fixed arity), and show that this is a much less important factor than often assumed. I also address the question of predicting practical performance across grammars and processing platforms. Finally, I take a wider perspective and report on the overall improvement of processing performance for HPSG grammars (as exemplified by the LinGO grammar) that has been achieved over a period of four years by an international consortium of research groups.

18 Figures and Tables

Cite this paper

@inproceedings{Callmeier2001EfficientPW, title={Efficient Parsing with Large-Scale Unification Grammars}, author={Ulrich Callmeier and Gert Smolka}, year={2001} }