Interlingua Approximation: A Generation-Heavy Approach

Abstract

To date, construction of interlingual resources continues to be a labor-intensive process—often resulting in knowledge-based systems that suffer from a lack of robustness. Such systems may work well on certain types of phenomena, but their complex knowledge-based foundation makes them difficult to extend to new phenomena or languages. We adopt the view that it is possible to approximate the depth of knowledge-based interlingual systems by tapping into the richness of target-language (TL) resources (i.e., English, in our projects) and using this information to map the source-language (SL) input to the English output. A key feature of our approach is the use of some, but not all, components of an interlingual representation (e.g., the top-level primitives and basic argument structure) to map representations associated with a resource-poor language into those of a resource-rich language. The approach lends itself to the generation of multiple sentences that are statistically pared down so that the most likely sentence is generated according to the constraints of the TL. Consider the oft-cited Spanish example, “Yo le di puñaladas a John” (I gave knifewounds to John, i.e., “I stabbed John”). Such cases are traditionally handled in interlingual systems by means of decomposition into a conceptual representation (Dorr, 1993). We espouse a more economical approach that uses the structure of syntactic dependencies coupled with knowledge encoded in the Lexical Conceptual Structure Verb Database (LVD) of (Dorr, 2001). More specifically, rather than mapping the SL input into a representation with the full range of interlingual components, this simpler approach uses only the argument structure of the input dependency tree and top-level conceptual nodes (such as the “CAUSE GO”) coupled with thematic-role information. In order to produce a TL (English) sentence from this representation, the top-level conceptual nodes are first checked for possible matches—and then conflated arguments (the STABN node below) are potentially absorbed into other predicate positions, as long as there is a relation between the conflated argument and the new predicate node, disregarding part-of-speech (in this case STABV). This process is shown pictorially below.

Extracted Key Phrases

Cite this paper

@inproceedings{Dorr2002InterlinguaAA, title={Interlingua Approximation: A Generation-Heavy Approach}, author={Bonnie J. Dorr and Nizar Habash}, year={2002} }