• Publications
  • Influence
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses. Expand
HuggingFace's Transformers: State-of-the-art Natural Language Processing
The \textit{Transformers} library is an open-source library that consists of carefully engineered state-of-the art Transformer architectures under a unified API and a curated collection of pretrained models made by and available for the community. Expand
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents
A new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo is introduced which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model which shows strong improvements over the current state-of-the-art end-to-end conversational models. Expand
A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
A hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. Expand
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Experiments show that when pruning large pretrained language models, movement pruning shows significant improvements in high-sparsity regimes and when combined with distillation, the approach achieves minimal accuracy loss with down to only 3% of the model parameters. Expand
Transformers : State-ofthe-art Natural Language Processing
Recent advances in modern Natural Language Processing (NLP) research have been dominated by the combination of Transfer Learning methods with large-scale Transformer language models. With them came aExpand
Some additional experiments extending the tech report ” Assessing BERT ’ s Syntactic Abilities ” by Yoav
This document report a few additional experiments extending Yoav Goldberg’s tech report ”Assessing BERT’s Syntactic Abilities”, which can be found at http://u.cs.biu.ac.il/~yogo/bert-syntax.pdf. TheExpand
Toward terahertz heterodyne detection with superconducting Josephson junctions
We report on the high-frequency mixing properties of ion irradiated YBa2Cu3O7 Josephson junctions. The frequency range, spanning above and below the characteristic frequencies fc of the junctions,Expand
HTS Josephson junctions arrays for high-frequency mixing
We designed, fabricated and measured short one-dimensional arrays of masked ion-irradiated YBa 2 Cu 3 O 7 Josephson junctions embedded into log-periodic spiral antennas. They consist of 4 or 8Expand
Anisotropic optical properties of detwinned BaFe2As2
The optical properties of a large, detwinned single crystal of BaFe$_2$As$_2$ have been examined over a wide frequency range above and below the structural and magnetic transition at $T_{\rm N}\simeqExpand