Corpus ID: 204509627

HuggingFace's Transformers: State-of-the-art Natural Language Processing

@article{Wolf2019HuggingFacesTS,
  title={HuggingFace's Transformers: State-of-the-art Natural Language Processing},
  author={Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and R'emi Louf and Morgan Funtowicz and Jamie Brew},
  journal={ArXiv},
  year={2019},
  volume={abs/1910.03771}
}
Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art… Expand
Poor Man's BERT: Smaller and Faster Transformer Models
TLDR
A number of memory-light model reduction strategies that do not require model pre-training from scratch are explored, which are able to prune BERT, RoBERTa and XLNet models by up to 40%, while maintaining up to 98% of their original performance. Expand
Entity Matching with Transformer Architectures - A Step Forward in Data Integration
TLDR
This paper empirically compares the capability of transformer architectures and transfer-learning on the task of EM and shows that transformer architectures outperform classical deep learning methods in EM by an average margin of 27.5%. Expand
Investigating Transformers for Automatic Short Answer Grading
TLDR
This work trains the newest and most powerful, according to the glue benchmark, transformers on the SemEval-2013 dataset, and shows that models trained with knowledge distillation are feasible for use in short answer grading. Expand
Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks
TLDR
Evaluated Transformer-based models in Natural Language Inference and Question Answering tasks reveal that RoBERTa, XLNet and BERT are more robust than recurrent neural network models to stress tests for both NLI and QA tasks, revealing that there is still room for future improvement in this field. Expand
Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation
TLDR
Directed Beam Search is proposed, a plug-and-play method for lexically constrained language generation that can be applied to any language model, is easy to implement and can be used for general language generation. Expand
GMAT: Global Memory Augmentation for Transformers
TLDR
This work proposes to augment sparse Transformer blocks with a dense attention-based $\textit{global memory}$ of length $M$ ($\ll L$) which provides an aggregate global view of the entire input sequence to each position, and empirically shows that this method leads to substantial improvement on a range of tasks. Expand
Domain and Task Adaptive Pretraining for Language Models
TLDR
This paper can confirm results from a recent study that continuing pretraining on the domain and the task data substantially improves task performance, and training a model from scratch using Electra is not competitive for the authors' data sets. Expand
RuSentEval: Linguistic Source, Encoder Force!
TLDR
RuSentEval is introduced, an enhanced set of 14 probing tasks for Russian, including ones that have not been explored yet, to explore the distribution of various linguistic properties in five multilingual transformers for two typologically contrasting languages. Expand
Adaptation of Deep Bidirectional Transformers for Afrikaans Language
TLDR
The results show that AfriBERT improves the current state-of-the-art in most of the tasks the authors considered, and that transfer learning from multilingual to monolingual model can have a significant performance improvement on downstream tasks. Expand
Deep Transformer based Data Augmentation with Subword Units for Morphologically Rich Online ASR
TLDR
This paper pre-train a GPT-2 Transformer LM on a general text corpus and fine-tune it on the authors' Hungarian conversational call center ASR task and shows that subword-based neural text augmentation outperforms the word-based approach not only in terms of overall WER but also in recognition of OOV words. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 80 REFERENCES
AllenNLP: A Deep Semantic Natural Language Processing Platform
TLDR
AllenNLP is designed to support researchers who want to build novel language understanding models quickly and easily and provides a flexible data API that handles intelligent batching and padding, and a modular and extensible experiment framework that makes doing good science easy. Expand
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
Reformer: The Efficient Transformer
TLDR
This work replaces dot-product attention by one that uses locality-sensitive hashing and uses reversible residual layers instead of the standard residuals, which allows storing activations only once in the training process instead of several times, making the model much more memory-efficient and much faster on long sequences. Expand
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
TLDR
This systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks and achieves state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. Expand
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
TLDR
This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses. Expand
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
TLDR
This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence. Expand
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
TLDR
A new benchmark styled after GLUE is presented, a new set of more difficult language understanding tasks, a software toolkit, and a public leaderboard are presented. Expand
exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models
TLDR
ExBERT provides insights into the meaning of the contextual representations and attention by matching a human-specified input to similar contexts in large annotated datasets, and can quickly replicate findings from literature and extend them to previously not analyzed models. Expand
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
TLDR
This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Expand
Transfer Learning in Natural Language Processing
TLDR
An overview of modern transfer learning methods in NLP, how models are pre-trained, what information the representations they learn capture, and review examples and case studies on how these models can be integrated and adapted in downstream NLP tasks are presented. Expand
...
1
2
3
4
5
...