The brWaC Corpus: A New Open Resource for Brazilian Portuguese
- Jorge Wagner, Rodrigo Wilkens, M. Idiart, Aline Villavicencio
- Computer ScienceInternational Conference on Language Resources…
- 1 May 2018
In this work, we present the construction process of a large Web corpus for Brazilian Portuguese, aiming to achieve a size comparable to the state of the art in other languages. We also discuss our…
B2SG: a TOEFL-like Task for Portuguese
- Rodrigo Wilkens, Leonardo Zilio, Eduardo Ferreira, Aline Villavicencio
- Computer ScienceInternational Conference on Language Resources…
- 1 May 2016
The BabelNet-Based Semantic Gold Standard (B2SG) was automatically constructed based on BabelNet and partly evaluated by human judges and can be used as the basis for evaluating the accuracy of the similarity relations on distributional thesauri.
Size Does Not Matter. Frequency Does. A Study of Features for Measuring Lexical Complexity
- Rodrigo Wilkens, A. Vecchia, Marcely Zanon Boito, Muntsa Padró, Aline Villavicencio
- LinguisticsIbero-American Conference on AI
- 24 November 2014
Interestingly, the results show that word length is not important, while corpus frequency is enough to correctly classify a large proportion of the test cases (F-measure over 80 %).
Crawling by Readability Level
- Jorge Wagner, Rodrigo Wilkens, Leonardo Zilio, M. Idiart, Aline Villavicencio
- Computer ScienceInternational Conference on Computational…
- 13 July 2016
A framework for automatic generation of large corpora classified by readability is proposed, which adopts a supervised learning method to incorporate a readability filter based in features with low computational cost to a crawler, to collect texts targeted at a specific reading level.
Using NLP for Enhancing Second Language Acquisition
- Leonardo Zilio, Rodrigo Wilkens, Cedric Fairon
- Computer Science, LinguisticsRecent Advances in Natural Language Processing
- 10 November 2017
This study presents SMILLE, a system that draws on the Noticing Hypothesis and on input enhancements, addressing the lack of salience of grammatical infor mation in online documents chosen by a given…
PassPort: A Dependency Parsing Model for Portuguese
- Leonardo Zilio, Rodrigo Wilkens, Cedric Fairon
- Computer ScienceInternational Conference on Computational…
- 24 September 2018
PassPort is introduced, a model for the dependency parsing of Portuguese trained with the Stanford Parser, which achieved very similar results for dependency parsing, with a LAS of 85.02 for PassPort against 84.36 for PALAVRAS.
Enhancing Grammatical Structures in Web-Based Texts.
- Leonardo Zilio, Rodrigo Wilkens, Cedric Fairon
- Computer Science, Linguistics
- 1 August 2017
The SMILLE system is presented, a system that uses Natural Language Processing for enhancing grammatical information in texts chosen by a given user and is designed to draw the users’ attention to specific grammatical structures and thus help them to notice their occurrence in authentic contexts.
Automatic Construction of Large Readability Corpora
- Jorge Wagner, Rodrigo Wilkens, Aline Villavicencio
- Computer ScienceCL4LC@COLING
- 1 December 2016
A framework for the automatic construction of large Web corpora classified by readability level is presented, including 1.7 million documents and about 1.6 billion tokens, already parsed and annotated with 134 different textual attributes, along with the agreement among the various classifiers.
HECTOR: A Hybrid TExt SimplifiCation TOol for Raw Texts in French
- A. Todiraşcu, Rodrigo Wilkens, Eva Rolin, Thomas François, D. Bernhard, Núria Gala
- Computer Science, LinguisticsInternational Conference on Language Resources…
- 2022
This paper presents a system, which is based on a combination of methods relying on word embeddings for lexical simplification and rule-based strategies for syntax and discourse adaptations and presents an evaluation of the lexical, syntactic and discourse-level simplifications according to automatic and humane valuations.
FABRA: French Aggregator-Based Readability Assessment toolkit
- Rodrigo Wilkens, David Alfter, Thomas François
- Computer ScienceInternational Conference on Language Resources…
- 2022
The FABRA: readability toolkit is implemented as a service-oriented architecture, which obviates the need for installation, and simplifies its integration into other projects, and has the potential to support new research on readability assessment for French.
...
...