Active Learning for BERT: An Empirical Study

  title={Active Learning for BERT: An Empirical Study},
  author={Liat Ein-Dor and Alon Halfon and Ariel Gera and Eyal Shnarch and Lena Dankin and Leshem Choshen and Marina Danilevsky and Ranit Aharonov and Yoav Katz and Noam Slonim},
  booktitle={Conference on Empirical Methods in Natural Language Processing},
Real world scenarios present a challenge for text classification, since labels are usually expensive and the data is often characterized by class imbalance. Active Learning (AL) is a ubiquitous paradigm to cope with data scarcity. Recently, pre-trained NLP models, and BERT in particular, are receiving massive attention due to their outstanding performance in various NLP tasks. However, the use of AL with deep pre-trained models has so far received little consideration. Here, we present a large… 

Figures and Tables from this paper

Active learning for reducing labeling effort in text classification tasks

Results show that using uncertainty-based AL with BERT base outperforms random sampling of data and the influence of the query-pool size on the performance of AL.

A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

This survey provides guidance for NLP researchers and practitioners dealing with imbalanced data and covers approaches that have been explicitly proposed for class-imbalanced NLP tasks or, originating in the computer vision community, have been evaluated on them.

ALLIE: Active Learning on Large-scale Imbalanced Graphs

This work proposes a novel framework ALLIE, which efficiently sample from both majority and minority classes using a reinforcement learning agent with imbalance-aware reward function and employs focal loss in the node classification model in order to focus more on rare class and improve the accuracy of the downstream model.

An Efficient Active Learning Pipeline for Legal Text Classification

This work proposes a pipeline for effectively using active learning with pre-trained language models in the legal domain, and proposes a simple, yet effective, strategy to start with the initial set of labeled samples with fewer actions compared to existing methods.

Deep Active Learning for Text Classification with Diverse Interpretations

A novel Active Learning with DivErse iNterpretations (ALDEN) approach, inspired by piece-wise linear interpretability of DNNs, that selects samples according to their diversity of local interpretations and queries their labels to tackle the text classification problem.

ATM: An Uncertainty-aware Active Self-training Framework for Label-efficient Text Classification

ATM is a new framework that leverages self-training to exploit unlabeled data and is agnostic to the specific AL algorithm, serving as a plug-in module to improve existing AL methods.

Active Learning Over Multiple Domains in Natural Language Tasks

Among 18 acquisition functions from 4 families of methods, the first comprehensive analysis of both existing and novel methods for practitioners faced with multi-domain active learning for natural language tasks finds H-Divergence methods, and particularly the proposed variant DAL-E, yield effective results.

Towards Computationally Feasible Deep Active Learning

This work proposes two techniques that tackle the excessive computational resources required to train an acquisition model and estimate its uncertainty on instances in the unlabeled pool ofActive learning and demonstrates that the algorithm proposed is capable of training a more expressive successor model with higher performance.

AcTune: Uncertainty-aware Active Self-Training for Semi-Supervised Active Learning with Pretrained Language Models

Experiments show that A C T UNE outperforms the strongest active learning and self-training baselines and improves the label efficiency of PLM finetuning by 56.2% on average.

Is margin all you need? An extensive empirical study of active learning on tabular data

Surprisingly, it is found that the classical margin sampling technique matches or outperforms all others, including current state-of-art, in a wide range of experimental settings.



Sampling Bias in Deep Active Classification: An Empirical Study

This work demonstrates that active set selection using the posterior entropy of deep models like (FTZ) is robust to sampling biases and to various algorithmic choices (query size and strategies) unlike that suggested by traditional literature and proposes a simple baseline for deep active text classification that outperforms the state of the art.

LTP: A New Active Learning Strategy for Bert-CRF Based Named Entity Recognition

An uncertainty-based active learning strategy called Lowest Token Probability (LTP) is proposed which combines the input and output of CRF to select informative instance and performs slightly better than traditional strategies with obviously less annotation tokens on both sentence-level accuracy and entity-level F1-score.

Diverse mini-batch Active Learning

This work studies the problem of reducing the amount of labeled training data required to train supervised classification models by leveraging Active Learning, through sequential selection of examples which benefit the model most, and considers the mini-batch Active Learning setting, where several examples are selected at once.

Practical Obstacles to Deploying Active Learning

It is shown that while AL may provide benefits when used with specific models and for particular domains, the benefits of current approaches do not generalize reliably across models and tasks.

Deep Bayesian Active Learning for Natural Language Processing: Results of a Large-Scale Empirical Study

A large-scale empirical study of deep active learning, addressing multiple tasks and, for each, multiple datasets, multiple models, and a full suite of acquisition functions, finds that across all settings, Bayesian active learning by disagreement significantly improves over i.i.d. baselines and usually outperforms classic uncertainty sampling.

Discriminative Active Learning

Experimental results show the proposed batch mode active learning algorithm, Discriminative Active Learning, to be on par with state of the art methods in medium and large query batch sizes, while being simple to implement and also extend to other domains besides classification tasks.

Active Learning for Convolutional Neural Networks: A Core-Set Approach

This work defines the problem of active learning as core-set selection as choosing set of points such that a model learned over the selected subset is competitive for the remaining data points, and presents a theoretical result characterizing the performance of any selected subset using the geometry of the datapoints.

An Ensemble Deep Active Learning Method for Intent Classification

Experimental results on both Chinese and English intent classification datasets suggest that the proposed ensemble deep active learning method can achieve state-of-the-art performance with less than half of the training data.

Active Discriminative Text Representation Learning

It is argued that AL strategies for multi-layered neural models should focus on selecting instances that most affect the embedding space (i.e., induce discriminative word representations), in contrast to traditional AL approaches which specify higher level objectives.