• Corpus ID: 235247921

AutoSampling: Search for Effective Data Sampling Schedules

@article{Sun2021AutoSamplingSF,
  title={AutoSampling: Search for Effective Data Sampling Schedules},
  author={Ming Sun and Hao Dou and Baopu Li and Lei Cui and Junjie Yan and Wanli Ouyang},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.13695}
}
Data sampling acts as a pivotal role in training deep learning models. However, an effective sampling schedule is difficult to learn due to the inherently high dimension of parameters in learning the sampling schedule. In this paper, we propose an AutoSampling method to automatically learn sampling schedules for model training, which consists of the multi-exploitation step aiming for optimal local sampling schedules and the exploration step for the ideal sampling distribution. More specifically… 

Figures and Tables from this paper

Boosting Supervised Dehazing Methods via Bi-level Patch Reweighting

The gradients of BILD exhibit natural connections with policy gradient and can explain the BILD objective by the rewarding mechanism in reinforcement learning, and is better recognized as a flexible framework that can seamlessly work with general supervised dehazing approaches for their performance boosting.

Automatic Document Selection for Efficient Encoder Pretraining

Cynical Data Selection is extended, a statistical sentence scoring method that conditions on a representative target domain corpus and consistently outperforms random selection with 20x less data, 3x fewer training iterations, and 2x less estimated cloud compute cost.

A Survey of Data Optimization for Problems in Computer Vision Datasets

A first review of recent advances in data optimization is presented, which defines data optimization and classify data optimization algorithms into three directions according to the optimization form: data sampling, data subset selection, and active learning.

References

SHOWING 1-10 OF 35 REFERENCES

Not All Samples Are Created Equal: Deep Learning with Importance Sampling

A principled importance sampling scheme is proposed that focuses computation on "informative" examples, and reduces the variance of the stochastic gradients during training, and derives a tractable upper bound to the per-sample gradient norm.

AutoAugment: Learning Augmentation Policies from Data

This paper describes a simple procedure called AutoAugment to automatically search for improved data augmentation policies, which achieves state-of-the-art accuracy on CIFAR-10, CIFar-100, SVHN, and ImageNet (without additional data).

Training Deep Models Faster with Robust, Approximate Importance Sampling

A robust, approximate importance sampling procedure (RAIS) for stochastic gradient de- scent by approximating the ideal sampling distribution using robust optimization, RAIS provides much of the benefit of exact importance sampling with drastically reduced overhead.

Dynamic Curriculum Learning for Imbalanced Data Classification

A unified framework called Dynamic Curriculum Learning (DCL) is proposed to adaptively adjust the sampling strategy and loss weight in each batch, which results in better ability of generalization and discrimination in human attribute analysis.

Learning What Data to Learn

A deep reinforcement learning framework, which is called NDF, which takes advantage of a deep neural network to adaptively select and filter important data instances from a sequential stream of training data, such that the future accumulative reward is maximized.

Learning to Reweight Examples for Robust Deep Learning

This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules

This paper introduces a new data augmentation algorithm, Population Based Augmentation (PBA), which generates nonstationary augmentation policy schedules instead of a fixed augmentationpolicy.

Adversarial AutoAugment

An adversarial method to arrive at a computationally-affordable solution called Adversarial AutoAugment, which can simultaneously optimize target related object and augmentation policy search loss and demonstrate significant performance improvements over state-of-the-art.

Online Batch Selection for Faster Training of Neural Networks

This work investigates online batch selection strategies for two state-of-the-art methods of stochastic gradient-based optimization, AdaDelta and Adam, and proposes a simple strategy where all datapoints are ranked w.r.t. their latest known loss value and the probability to be selected decays exponentially as a function of rank.

CASED: Curriculum Adaptive Sampling for Extreme Data Imbalance

The CASED learning framework makes no assumptions with regard to imaging modality or segmentation target and should generalize to other medical imaging problems where class imbalance is a persistent problem.