• Publications
  • Influence
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
TLDR
ASSET is a crowdsourced multi-reference corpus where each simplification was produced by executing several rewriting transformations, and it is shown that simplifications in ASSET are better at capturing characteristics of simplicity when compared to other standard evaluation datasets for the task.
Learning How to Simplify From Explicit Labeling of Complex-Simplified Text Pairs
TLDR
A way to automatically identify operations in a parallel corpus and introduce a sequence-labeling approach based on these annotations is devised, which provides insights on the types of transformations that different approaches can model.
Data-Driven Sentence Simplification: Survey and Benchmark
TLDR
Research on SS is surveyed, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays.
MASSAlign: Alignment and Annotation of Comparable Documents
We introduce MASSAlign: a Python library for the alignment and annotation of monolingual comparable documents. MASSAlign offers easy-to-use access to state of the art algorithms for paragraph and
EASSE: Easier Automatic Sentence Simplification Evaluation
TLDR
This work introduces EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems, and shows that these functionalities allow for better comparison and understanding of the performance of SS systems.
Controllable Text Simplification with Explicit Paraphrasing
TLDR
A novel hybrid approach is proposed that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles and establishes a new state-of-the-art for the task.
Semantic Role Labeling for Brazilian Portuguese: A Benchmark
TLDR
This work presents a benchmark for comparing SRL systems for Brazilian Portuguese, based on the CoNLL Shared Tasks on SRL for English, and implements a supervised SRL system which outperforms the baseline (17 points better in F 1 measure).
Coh-Metrix-Esp: A Complexity Analysis Tool for Documents Written in Spanish
TLDR
This paper presents the Spanish version of Coh-Metrix, which is able to calculate 45 readability indices and analyses how these indices behave in a corpus of “simple” and “complex” documents, and uses them as features in a complexity binary classifier for texts in Spanish.
Strong Baselines for Complex Word Identification across Multiple Languages
TLDR
This paper presents monolingual and cross-lingual CWI models that perform as well as (or better than) most models submitted to the latest CWI Shared Task, and shows that carefully selected features and simple learning models can achieve state-of-the-art performance.
Towards Semi-supervised Brazilian Portuguese Semantic Role Labeling: Building a Benchmark
TLDR
This paper proposes to use a semi-supervised approach capable of taking advantage of both annotated and unannotated data available, and outlines the methodology for the development of this SRL system, the same as the benchmark to be used to test its performance.
...
...