Corpus ID: 237213301

Contrastive Language-Image Pre-training for the Italian Language

  title={Contrastive Language-Image Pre-training for the Italian Language},
  author={Federico Bianchi and Giuseppe Attanasio and Raphael Pisoni and Silvia Terragni and Gabriele Sarti and S. Veera Lakshmi},
CLIP (Contrastive Language–Image Pretraining) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs high-quality translations of the texts to guarantee a good performance. In this paper, we present the… Expand

Figures and Tables from this paper


Learning Transferable Visual Models From Natural Language Supervision
It is demonstrated that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. Expand
Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning
We present a new dataset of image caption annotations, Conceptual Captions, which contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) and represents a wider varietyExpand
wav2vec: Unsupervised Pre-training for Speech Recognition
Wav2vec is trained on large amounts of unlabeled audio data and the resulting representations are then used to improve acoustic model training and outperforms Deep Speech 2, the best reported character-based system in the literature while using two orders of magnitude less labeled training data. Expand
What the [MASK]? Making Sense of Language-Specific BERT Models
The current state of the art in language-specific BERT models is presented, providing an overall picture with respect to different dimensions (i.e. architectures, data domains, and tasks), and an immediate and straightforward overview of the commonalities and differences are provided. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Microsoft COCO: Common Objects in Context
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of sceneExpand
“You Sound Just Like Your Father” Commercial Machine Translation Systems Include Stylistic Biases
The findings suggest that translation models reflect demographic bias in the training data, which opens up interesting new research avenues in machine translation to take stylistic considerations into account. Expand
SGDR: Stochastic Gradient Descent with Warm Restarts
This paper proposes a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks and empirically studies its performance on the CIFAR-10 and CIFARS datasets. Expand
Energy and Policy Considerations for Deep Learning in NLP
This paper quantifies the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP and proposes actionable recommendations to reduce costs and improve equity in NLP research and practice. Expand
On the Gap between Adoption and Understanding in NLP
A position paper outlining five issues with current research trends in NLP that can hamper the free development of scientific research and suggesting ways forward. Expand