Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods

@article{Qi2020SmallDC,
  title={Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods},
  author={Guo-Jun Qi and Jiebo Luo},
  journal={IEEE transactions on pattern analysis and machine intelligence},
  year={2020},
  volume={PP}
}
  • Guo-Jun Qi, Jiebo Luo
  • Published 27 March 2019
  • Medicine, Computer Science
  • IEEE transactions on pattern analysis and machine intelligence
Representation learning with small labeled data have emerged in many problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training sophisticated models with few labeled data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of models will be… 
A survey on Semi-, Self- and Unsupervised Techniques in Image Classification
TLDR
An overview of often used ideas and methods in image classification with fewer labels is provided and three major trends are identified that are scaleable to real-world applications based on their accuracy.
A Survey on Deep Semi-supervised Learning
TLDR
A taxonomy for deep semi-supervised learning is presented that categorizes existing methods, including deep generative methods, consistency regularization methods, graph-based methods, pseudo-labeling methods, and hybrid methods and offers a detailed comparison of these methods in terms of the type of losses, contributions, and architecture differences.
EnAET: Self-Trained Ensemble AutoEncoding Transformations for Semi-Supervised Learning
TLDR
This study trains an Ensemble of Auto-Encoding Transformations (EnAET) to learn from both labeled and unlabeled data based on the embedded representations by decoding both spatial and non-spatial transformations under a rich family of transformations.
A survey on data‐efficient algorithms in big data era
TLDR
This work investigates the issue of algorithms’ data hungriness, presents a comprehensive review of existing data-efficient methods and systematizes them into four categories, and delineates the limitations, discusses research challenges, and suggests future opportunities to advance the research on data-efficiency in machine learning.
D$\textbf{S}^3$L: Deep Self-Semi-Supervised Learning for Image Recognition
TLDR
Deep Self-Semi-Supervised learning (D$S^3$L), a flexible multi-task framework with shared parameters that integrates the rotation task in Self-SL with the consistency-based methods in deep Semi-SL is proposed.
Explanation Consistency Training: Facilitating Consistency-Based Semi-Supervised Learning with Interpretability
TLDR
ECT (Explanation Consistency Training) is proposed which encourages a consistent reason of model decision under data perturbation and employs model explanation as a surrogate of the causality of model output, which is able to bridge state-of-the-art interpretability to SSL models and alleviate the high complexity of causality.
Semi-Supervised GANs with Complementary Generator Pair for Retinopathy Screening
TLDR
Experimental results on integrated three public iChallenge datasets show that the proposed GBGANs could fully utilize the available fundus images to identify retinopathy with little label cost.
Combination of Active Learning and Self-Paced Learning for Deep Answer Selection with Bayesian Neural Network
TLDR
This framework proposes an uncertainty quantification method based on Bayesian neural network, which can guide active learning and self-paced learning in the same iterative process of model training and can significantly reduce the labeled samples for model training.
Towards Robust Model Reuse in the Presence of Latent Domains
TLDR
The MRL (Model Reuse for multiple Latent domains) method, where both domain characteristics and pre-trained models are considered for the exploration of instances in the target task and the overall considerations are packed in a bi-level optimization framework with a reliable generalization.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 106 REFERENCES
Regularization With Stochastic Transformations and Perturbations for Deep Semi-Supervised Learning
TLDR
An unsupervised loss function is proposed that takes advantage of the stochastic nature of these methods and minimizes the difference between the predictions of multiple passes of a training sample through the network.
Semi-supervised Learning with Deep Generative Models
TLDR
It is shown that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.
Temporal Ensembling for Semi-Supervised Learning
TLDR
Self-ensembling is introduced, where it is shown that this ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training.
VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning
TLDR
VEEGAN is introduced, which features a reconstructor network, reversing the action of the generator by mapping from data to noise, and resists mode collapsing to a far greater extent than other recent GAN variants, and produces more realistic samples.
Semi-supervised learning with graphs
TLDR
A series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as vertices, and edges encode the similarity between instances are presented.
Learning Multiple Layers of Features from Tiny Images
TLDR
It is shown how to train a multi-layer generative model that learns to extract meaningful features which resemble those found in the human visual cortex, using a novel parallelization algorithm to distribute the work among multiple machines connected on a network.
Improved Techniques for Training GANs
TLDR
This work focuses on two applications of GANs: semi-supervised learning, and the generation of images that humans find visually realistic, and presents ImageNet samples with unprecedented resolution and shows that the methods enable the model to learn recognizable features of ImageNet classes.
Adversarial Autoencoders
TLDR
This paper shows how the adversarial autoencoder can be used in applications such as semi-supervised classification, disentangling style and content of images, unsupervised clustering, dimensionality reduction and data visualization, and performed experiments on MNIST, Street View House Numbers and Toronto Face datasets.
Learning Disentangled Representations with Semi-Supervised Deep Generative Models
TLDR
This work proposes to learn disentangled representations that encode distinct aspects of the data into separate variables using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder.
Recent Advances in Zero-shot Recognition
TLDR
This article provides a comprehensive review of existing zero-shot recognition techniques covering various aspects ranging from representations of models, and from datasets and evaluation settings and highlights the limitations of existing approaches.
...
1
2
3
4
5
...