Motivations and Methods for Text Simplification

Abstract

Lottg alld eolni)licated seltteltces prov(: to b(: a. stumbling block for current systems relying on N[, input. These systenls s tand to gaill frolil ntethods that syntacti<:aHy simplily su<:h sentences. ']b simplify a sen= tence, we nee<t an idea of tit(." structure of the sentence, to identify the <:omponents to be separated out. Obviously a parser couhl be used to obtain the complete structure of the sentence. ][owever, hill parsing is slow a+nd i)rone to fa.ilure, especially on <:omph!x sentences. In this l)aper, we consider two alternatives to fu]l parsing which could be use<l for simplification. The tirst al)l)roach uses a Finite State Grammar (FSG) to prodn<:e noun and verb groups while the second uses a Superta.gging model to i)roduce dependency linkages. We discuss the impact of these two input representations on the simplification pro(:ess. 1 R e a s o n s f o r T e x t S i m p l i f i c a t i o n l ,ong and <:oml)licatcd sentences prove to be a s t u m l J i n g block for <'urrent sys tems which rely on na tu ra l l anguage input . ' l ' lmsc sys tems s t and to gain f rom metho<ls t ha t preprocess such sentences so as to make t hem s impler . Consider , for examph;, the fol lowing sentence: ( l ) 7'he embattled Major government survived a crucial 'vole on coal pits closure as its las t -minute concessions curbed the extent of ' lbry revolt over an issue that generated u'ausual heat in the l]ousc of Commons and brought the miners to London streets. Such sentences are not u n c o m m o n in newswire texts . ( ] o m p a r e th is wi th the mul t i sen tence version which has been m a n u a l l y s implif ied: (2) The embatlled Major governmcnl survived a crucial vote o'u coal pits closure. Its las t :minute conccssious curbed the cxlenl o]" *On leave fl'om the National Centre for Soft, ware Techno]ogy, ( lulmohar (?ross Road No. 9, Juhu, Bombay 4:0(/ (149, India Tory revolt over the coal-miue issue. Th.is issue generaled unusual heat in the l tousc o f Commons . II also brought the miners to London streels. If coml>lex text can be made simph'x, senten(-es beconae easier to process, both for In:Og r a m s and humans . Wc discuss a s impl i f icat ion process which identif ies componen t s of a sentence t ha t may be separa ted out, and t r ans fo rms each of these into f r e c s t a , d i n g s imple r sentences. (]learly, some mmnees of mean ing from the original tex t m a y be lost in the s impl i f ica t ion process. S impl i t ica t ion is theretbre i n a p p r o p r i a t e for tex ts (such as legal docunlents ) where it is impor ta .n t not to lose any nuance. I |owew;r, one c.~tl] COilceive of several areas of na tu r a l l anguage processing where such s impl i t ica t ion would be of g rea t use. This is especial ly t rue in do lna ins such as Inachine t rans la t ion , which c o m m o n l y have a manua l pos t -process ing stage, where seman t i c and pragma t i c repairs m a y be <'arried out if ne<;essary. • Pars ing: Syn tac t i ca l ly <:omplex sentence's arc likely to genera te a large number of parses , and may cause parsers to fail a l toge ther . Resolving ambigu i t i e s in a t t a c h m e n t of cons t i tuen t s is nont r iv ia l . Th is ambiguii , y is reduced for s impler sentences sin<'e they involve fewer cons t i tuents . 'Fhus s imple r sentences lead to faster pars ing and less parse aml)iguity. Once the i>arses for the s imple r sentences are ob ta ined , the subparses can be assembled to form a full parse, or left as is, depend ing on the app l ica t ion . • Machine Trans l a t i on (MT): As in the parsing case, s impl i f ica t ion resul ts in s impler scnten t ia l s t ruc tures and reduced ambigu i ty . As argued in (Chandraseka r , 1994), this conld lead to improvemen t s in the qua l i ty of machine t r ans la t ion . • I n fo rma t ion Retr ieval : IR sys tems usua l ly retr ieve large s e g m e n t s of tex ts of which only a pa r t n]ay bc reh~'wml,. Wi t | , s impl i f ied texts , it is possible to ex t rac t Sl>eCific phrases or s imple sentences of relevance in response to queries.

Extracted Key Phrases

0102030'00'02'04'06'08'10'12'14'16
Citations per Year

167 Citations

Semantic Scholar estimates that this publication has 167 citations based on the available data.

See our FAQ for additional information.

Cite this paper

@inproceedings{Chandrasekar1996MotivationsAM, title={Motivations and Methods for Text Simplification}, author={Raman Chandrasekar and Christine Doran and Srinivas Bangalore}, booktitle={COLING}, year={1996} }