A Survey of Deep Active Learning

  title={A Survey of Deep Active Learning},
  author={Pengzhen Ren and Yun Xiao and Xiaojun Chang and Po-Yao Huang and Zhihui Li and Xiaojiang Chen and Xin Wang},
  journal={ACM Computing Surveys (CSUR)},
  pages={1 - 40}
Active learning (AL) attempts to maximize a model’s performance gain while annotating the fewest samples possible. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize a massive number of parameters if the model is to learn how to extract high-quality features. In recent years, due to the rapid development of internet technology, we have entered an era of information abundance characterized by massive amounts of available data. As a result, DL has… 

Figures and Tables from this paper

Temporal Output Discrepancy for Loss Estimation-based Active Learning

A novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss, and shows that TOD can be utilized to select the best model of potentially the highest testing accuracy from a pool of candidate models.

Towards Exploring the Limitations of Active Learning: An Empirical Study

The existence of a trade-off between labeling effort and different model qualities is demonstrated and paves the way for future research in devising data selection metrics considering multiple quality criteria.

ImitAL: Learning Active Learning Strategies from Synthetic Data

This work proposes IMITAL, a novel query strategy, which encodes AL as a learning-to-rank problem, and shows that the approach is more runtime performant than most other strategies, especially on very large datasets.

Low Budget Active Learning : Theory and Algorithms

This work focuses on the relation between the number of labeled examples (budget size), and suitable querying strategies, and proposes TypiClust and ProbCover – two deep active learning strategies suited for low budgets.

A Survey on Active Deep Learning: From Model Driven to Data Driven

This survey categorizes ADL into model-driven ADL and data-drivenADL by whether its selector is model driven or data driven, and points out that, with the development of deep learning, the selector in ADL is experiencing the stage from model driven to data driven.

Towards General and Efficient Active Learning

A novel general and efficient active learning (GEAL) method that can conduct data selection processes on different datasets with a single-pass inference of the same model, and proposes knowledge clusters that are easily extracted from the intermediate features of the pre-trained network.

A Survey of Learning on Small Data

This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion.

Towards Robust Deep Active Learning for Scientific Computing

This work investigates the robustness of pool-based DAL methods for scientific computing problems (dominated by regression) where DNNs are increasingly used and proposes the first query synthesis DAL method for regression, termed NA-QBC, which removes the sensitive γ hyperparameter.

ALWars: Combat-Based Evaluation of Active Learning Strategies

An interactive system with a rich set of features to compare AL strategies in a novel replay view mode of all AL episodes with many available visualization and metrics, and support a rich variety of AL strategies by supporting the API of the powerful AL framework ALiPy.

Accelerating Diversity Sampling for Deep Active Learning By Low-Dimensional Representations

This work proposes to use the low-dimensional vector of predicted probabilities instead, which can be seamlessly integrated into existing methods and empirically demonstrates that this considerably decreases the query time, i.e., time to select an instance for annotation, while at the same time improving results.



A new active labeling method for deep learning

  • Dan WangYi Shang
  • Computer Science
    2014 International Joint Conference on Neural Networks (IJCNN)
  • 2014
A new active labeling method, AL-DL, for cost-effective selection of data to be labeled, which outperforms random labeling consistently and is applied to deep learning networks based on stacked restricted Boltzmann machines, as well as stacked autoencoders.

Learning Loss for Active Learning

  • Donggeun YooIn-So Kweon
  • Computer Science
    2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • 2019
A novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks, by attaching a small parametric module, named ``loss prediction module,'' to a target network, and learning it to predict target losses of unlabeled inputs.

DeActive: Scaling Activity Recognition with Active Deep Learning

This paper proposes a deep and active learning enabled activity recognition model, DeActive, which is optimized according to the problem domain and reduce the resource requirements, and incorporates active learning in the process to minimize the human supervision along with the effort needed for compiling ground truth.

Sampling Bias in Deep Active Classification: An Empirical Study

This work demonstrates that active set selection using the posterior entropy of deep models like FastText.zip (FTZ) is robust to sampling biases and to various algorithmic choices (query size and strategies) unlike that suggested by traditional literature and proposes a simple baseline for deep active text classification that outperforms the state of the art.

A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions

This survey provides a new perspective on the NAS starting with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these earlyNAS algorithms, and then giving solutions for subsequent related research work.

Active Learning for Convolutional Neural Networks: A Core-Set Approach

This work defines the problem of active learning as core-set selection as choosing set of points such that a model learned over the selected subset is competitive for the remaining data points, and presents a theoretical result characterizing the performance of any selected subset using the geometry of the datapoints.

Bayesian Generative Active Deep Learning

This paper proposes a Bayesian generative active deep learning approach that combines active learning with data augmentation and provides theoretical and empirical evidence that this approach has more efficient training and better classification results than data augmented and active learning.

Training Data Distribution Search with Ensemble Active Learning

This paper proposes to scale up ensemble Active Learning methods to perform acquisition at a large scale (10k to 500k samples at a time) with ensembles of hundreds of models, obtained at a minimal computational cost by reusing intermediate training checkpoints to automatically and efficiently perform a training data distribution search for large labeled datasets.

Cost-Effective Active Learning for Deep Image Classification

This paper proposes a novel active learning (AL) framework, which is capable of building a competitive classifier with optimal feature representation via a limited amount of labeled training instances in an incremental learning manner and incorporates deep convolutional neural networks into AL.

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision

A low-complexity method for feature density matching using self-supervised Fisher kernel (FK) as well as several novel pseudo-label estimators that outperforms state-of-the-art methods on MNIST, SVHN, and ImageNet classification while requiring only 1/10th of processing.