Research In Text Processing: Creating Robust And Portable Systems

Abstract

In natural language text, much of the information is implicit and much of it, viewed in isolation, is ambiguous. Increased information about syntactic usage, discourse patterns, and the semantics of particular domains is essential to resolve this ambiguity and extract the intended facts from the text. However, collecting this information manually for each type of text is difficult and time-consuming, and renders the system non-portable. It is therefore desirable to be able to extract such characteristics as the relative preference for different syntactic structures and the semantic classes and constraints automatically from a sample of text in a particular domain. Since the text samples are finite, this information will always be incomplete. In addition, any real text will contain typographical and syntactic errors and semantic relations outside the principal domain. In consequence, a high-performance system will require a forgiving analysis procedure which tries to minimize constraint violations but does not insist on a "perfect" input. To guide and evaluate our work on the underlying technologies, we have developed three message processing applications over the past five years. The first was for CASREPs equipment failure messages. The focus for this system was on deep domain models for language understanding, and in particular for the determination of the implicit causal and temporal relations between events in a narrative. The other systems involved RAINFORMs and OPREPs messages describing naval encounters and engagements. These systems were developed for the Message Understanding Conferences organized by the Naval Ocean Systems Center. The focus for these systems was on robustness: the ability to extract at least partial information despite violations of syntactic or semantic constraints.

Extracted Key Phrases

Cite this paper

@inproceedings{Grishman1990ResearchIT, title={Research In Text Processing: Creating Robust And Portable Systems}, author={Ralph Grishman}, year={1990} }