Share This Author
Part-of-Speech and Prosody-based Approaches for Robot Speech and Gesture Synchronization
This paper proposes a model based on three different approaches to extend humanoid robots communication behaviour with upper body gestures synchronized with the speech for novel utterances, exploiting part-of-speech grammatical information, prosody cues, and a combination of both.
Improving the Quality of Video-to-Language Models by Optimizing Annotation of the Training Material
This work proposes automatic strategies for optimizing the annotations of video material, removing annotations that are not semantically relevant and generating new and more informative captions, and evaluates the approach on the MSR-VTT challenge with a state-of-the-art deep learning video-to-language model.
How much pretraining data do language models need to learn syntax?
The experiments show that while models pretrained on more data encode more syntactic knowledge and perform better on downstream applications, they do not always offer a better performance across the different syntactic phenomena and come at a higher financial and environmental cost.
Cartography of Natural Language Processing for Social Good (NLP4SG): Searching for Definitions, Statistics and White Spots
- Paula Fortuna, Laura Pérez-Mayos, Ahmed Ghassan Tawfiq AbuRa'ed, Juan Soler-Company, L. Wanner
- Computer ScienceNLP4POSIMPACT
A working definition of NLP4SG is proposed and some primary aspects that are crucial for NLP3SG are identified, including, e.g., areas, ethics, privacy and bias.
On the evolution of syntactic information encoded by BERT’s contextualized representations
- Laura Pérez-Mayos, Roberto Carlini, Miguel Ballesteros, L. Wanner
- Computer ScienceEACL
- 27 January 2021
This paper analyzes the evolution of the embedded syntax trees along the fine-tuning process of BERT for six different tasks, covering all levels of the linguistic structure.
MIN_PT: An European Portuguese Lexicon for Minorities Related Terms
Hate speech-related lexicons have been proved to be useful for many tasks such as data collection and classification. However, existing Portuguese lexicons do not distinguish between European and…
Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
- Laura Pérez-Mayos, Alba T'aboas Garc'ia, Simon Mille, L. Wanner
- Linguistics, Computer ScienceFINDINGS
- 10 May 2021
This work explores the syntactic generalization capabilities of the monolingual and multilingual versions of BERT and RoBERTa, and introduces SyntaxGymES, a novel ensemble of targeted syntactic tests in Spanish, designed to evaluate the syntax generalization abilities of language models through the Syntax Gym online platform.