Automatic induction of rules for text simplification


Long and complicated sentences pose various problems to many state-of-the-art natural language technologies. We have been exploring methods to automatically transform such sentences as to make them simpler. These methods involve the use of a rule-based system, driven by the syntax of the text in the domain of interest. Hand-crafting rules for every domain is time-consuming and impractical. This paper describes an algorithm and an implementation by which generalized rules for simplification are automatically induced from annotated training material with a novel partial parsing technique which combines constituent structure and dependency information. This algorithm described in the paper employs example-based generalizations on linguistically-motivated structures. Disciplines Cognitive Neuroscience | Theory and Algorithms Comments University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-96-30. This technical report is available at ScholarlyCommons:

DOI: 10.1016/S0950-7051(97)00029-4

