University of Groningen
Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
ETC-NLG: End-to-end Topic-Conditioned Natural Language Generation
This work presents ETC-NLG, an approach leveraging topic modeling annotations to enable fully-unsupervised End-to-end Topic-Conditioned Natural Language Generation over emergent topics in unlabeled document collections.
UmBERTo-MTSA @ AcCompl-It: Improving Complexity and Acceptability Prediction with Multi-task Learning on Self-Supervised Annotations
- Gabriele Sarti
- Computer ScienceEVALITA
- 10 November 2020
This work describes a self-supervised data augmentation approach used to improve learning models' performances when only a moderate amount of labeled data is available, obtaining considerable improvements in prediction quality.
IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation
The monolingual IT5 models are found to provide the best scale-to-performance ratio across tested models, consistently outperforming their multilingual counterparts and setting a new state-of-the-art for most Italian conditional language generation tasks.
That Looks Hard: Characterizing Linguistic Complexity in Humans and Language Models
This paper investigates the relationship between two complementary perspectives in the human assessment of sentence complexity and how they are modeled in a neural language model (NLM), highlighting how linguistic information encoded in representations changes when the model learns to predict complexity.
Italian Transformers Under the Linguistic Lens
- Alessio Miaschi, Gabriele Sarti, Dominique Brunato, F. Dell’Orletta, Giulia Venturi
Whether and how using different architectures of probing models affects the performance of Italian transformers in encoding a wide spectrum of linguistic features varies according to different textual genres is investigated.
Contrastive Language-Image Pre-training for the Italian Language
- Federico Bianchi, Giuseppe Attanasio, Raphael Pisoni, Silvia Terragni, Gabriele Sarti, S. Lakshmi
- Computer Science, LinguisticsArXiv
- 19 August 2021
The first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs, is presented and results show that CLIP- Italian outperforms the multilingual CLip model on the tasks of image retrieval and zero-shot classification.
Teaching NLP with Bracelets and Restaurant Menus: An Interactive Workshop for Italian Students
An interactive workshop designed to illustrate the basic principles of NLP and computational linguistics to high school Italian students aged between 13 and 18 years, in the form of a game in which participants play the role of machines needing to solve some of the most common problems a computer faces in understanding language.
A dissemination workshop for introducing young Italian students to NLP
We describe and make available the game-based material developed for a laboratory run at several Italian science festivals to popularize NLP among young students.
ArchiMeDe @ DANKMEMES: A New Model Architecture for Meme Detection
The ArchiMeDe system incorporates information from visual and textual sources through a multimodal neural ensemble to predict if input images and their respective metadata are memes or not, and each pre-trained neural network in the ensemble is first fine-tuned individually on the training dataset to perform domain adaptation.
InDeep × NMT: Empowering Human Translators via Interpretable Neural Machine Translation
New tools and methodologies to improve prediction attribution, error analysis, and controllable generation for NMT systems are developed in the specific context of translation.