Motivations and Methods for Text Simplification


Lottg alld eolni)licated seltteltces prov(: to b(: a. stumbling block for current systems relying on N[, input. These systenls s tand to gaill frolil ntethods that syntacti<:aHy simplily su<:h sentences. ']b simplify a sen= tence, we nee<t an idea of tit(." structure of the sentence, to identify the <:omponents to be separated out. Obviously a parser couhl be used to obtain the complete structure of the sentence. ][owever, hill parsing is slow a+nd i)rone to fa.ilure, especially on <:omph!x sentences. In this l)aper, we consider two alternatives to fu]l parsing which could be use<l for simplification. The tirst al)l)roach uses a Finite State Grammar (FSG) to prodn<:e noun and verb groups while the second uses a Superta.gging model to i)roduce dependency linkages. We discuss the impact of these two input representations on the simplification pro(:ess. 1 R e a s o n s f o r T e x t S i m p l i f i c a t i o n l ,ong and <:oml)licatcd sentences prove to be a s t u m l J i n g block for <'urrent sys tems which rely on na tu ra l l anguage input . ' l ' lmsc sys tems s t and to gain f rom metho<ls t ha t preprocess such sentences so as to make t hem s impler . Consider , for examph;, the fol lowing sentence: ( l ) 7'he embattled Major government survived a crucial 'vole on coal pits closure as its las t -minute concessions curbed the extent of ' lbry revolt over an issue that generated u'ausual heat in the l]ousc of Commons and brought the miners to London streets. Such sentences are not u n c o m m o n in newswire texts . ( ] o m p a r e th is wi th the mul t i sen tence version which has been m a n u a l l y s implif ied: (2) The embatlled Major governmcnl survived a crucial vote o'u coal pits closure. Its las t :minute conccssious curbed the cxlenl o]" *On leave fl'om the National Centre for Soft, ware Techno]ogy, ( lulmohar (?ross Road No. 9, Juhu, Bombay 4:0(/ (149, India Tory revolt over the coal-miue issue. issue generaled unusual heat in the l tousc o f Commons . II also brought the miners to London streels. If coml>lex text can be made simph'x, senten(-es beconae easier to process, both for In:Og r a m s and humans . Wc discuss a s impl i f icat ion process which identif ies componen t s of a sentence t ha t may be separa ted out, and t r ans fo rms each of these into f r e c s t a , d i n g s imple r sentences. (]learly, some mmnees of mean ing from the original tex t m a y be lost in the s impl i f ica t ion process. S impl i t ica t ion is theretbre i n a p p r o p r i a t e for tex ts (such as legal docunlents ) where it is impor ta .n t not to lose any nuance. I |owew;r, one c.~tl] COilceive of several areas of na tu r a l l anguage processing where such s impl i t ica t ion would be of g rea t use. This is especial ly t rue in do lna ins such as Inachine t rans la t ion , which c o m m o n l y have a manua l pos t -process ing stage, where seman t i c and pragma t i c repairs m a y be <'arried out if ne<;essary. • Pars ing: Syn tac t i ca l ly <:omplex sentence's arc likely to genera te a large number of parses , and may cause parsers to fail a l toge ther . Resolving ambigu i t i e s in a t t a c h m e n t of cons t i tuen t s is nont r iv ia l . Th is ambiguii , y is reduced for s impler sentences sin<'e they involve fewer cons t i tuents . 'Fhus s imple r sentences lead to faster pars ing and less parse aml)iguity. Once the i>arses for the s imple r sentences are ob ta ined , the subparses can be assembled to form a full parse, or left as is, depend ing on the app l ica t ion . • Machine Trans l a t i on (MT): As in the parsing case, s impl i f ica t ion resul ts in s impler scnten t ia l s t ruc tures and reduced ambigu i ty . As argued in (Chandraseka r , 1994), this conld lead to improvemen t s in the qua l i ty of machine t r ans la t ion . • I n fo rma t ion Retr ieval : IR sys tems usua l ly retr ieve large s e g m e n t s of tex ts of which only a pa r t n]ay bc reh~'wml,. Wi t | , s impl i f ied texts , it is possible to ex t rac t Sl>eCific phrases or s imple sentences of relevance in response to queries.

