Sanja Stajner

Learn More
Simplification of lexically complex texts, by replacing complex words with their simpler synonyms, helps non-native speakers, children, and language-impaired people understand text better. Recent lexical simplification methods rely on manually simplified corpora, which are expensive and time-consuming to build. We present an unsupervised approach to lexical(More)
In the last few years, there has been a growing number of studies addressing the Text Simplification (TS) task as a monolingual machine translation (MT) problem which translates from ‘original’ to ‘simple’ language. Motivated by those results, we investigate the influence of quality vs quantity of the training data on the effectiveness of such a MT approach(More)
Newswire text is often linguistically complex and stylistically decorated, hence very difficult to comprehend for people with reading disabilities. Acknowledging that events represent the most important information in news, we propose an eventcentered approach to news simplification. Our method relies on robust extraction of factual events and elimination(More)
This study presents the results of an initial phase of a project seeking to convert texts into a more accessible form for people with autism spectrum disorders by means of text simplification technologies. Random samples of Simple Wikipedia articles are compared with texts from News, Health, and Fiction genres using four standard readability indices(More)
This study addresses the automatic simplification of texts in Spanish in order to make them more accessible to people with cognitive disabilities. A corpus analysis of original and manually simplified news articles was undertaken in order to identify and quantify relevant operations to be implemented in a text simplification system. The articles were(More)
This study explores the possibility of replacing the costly and time-consuming human evaluation of the grammaticality and meaning preservation of the output of text simplification (TS) systems with some automatic measures. The focus is on six widely used machine translation (MT) evaluation metrics and their correlation with human judgements of(More)
This paper addresses the problem of automatic evaluation of text simplification systems for Spanish. We test whether already-existing readability formulae would be suitable for this task. We adapt three existing readability indices (two measuring lexical complexity and one measuring syntactic complexity) to be computed automatically, which are then applied(More)
In this paper we present the results of a study investigating the diachronic changes of four stylistic features: average sentence length, Automated Readability Index, lexical density and lexical richness in 20th century written English language. All experiments were conducted on the largest existing diachronic corpora of British and American English – the(More)
In this paper, we explore statistical machine translation (SMT) approaches to automatic text simplification (ATS) for Spanish. First, we compare the performances of the standard phrase-based (PB) and hierarchical (HIERO) SMT models in this specific task. In both cases, we build two models, one using the TS corpus with “light” simplifications and the other(More)