A survey on semi-supervised learning

  title={A survey on semi-supervised learning},
  author={Jesper E. van Engelen and Holger H. Hoos},
  journal={Machine Learning},
  pages={373 - 440}
Semi-supervised learning is the branch of machine learning concerned with using labelled as well as unlabelled data to perform certain learning tasks. Conceptually situated between supervised and unsupervised learning, it permits harnessing the large amounts of unlabelled data available in many use cases in combination with typically smaller sets of labelled data. In recent years, research in this area has followed the general trends observed in machine learning, with much attention directed at… 

Boosting the Performance of Semi-Supervised Learning with Unsupervised Clustering

It is shown that ignoring the labels altogether for whole epochs intermittently during training can significantly improve performance in the small sample regime, and the method's efficacy in boosting several state-of-the-art SSL algorithms is demonstrated.

An Efficient Approach to Select Instances in Self-Training and Co-Training Semi-supervised Methods

Three methods are proposed for automating the labeling process of unlabeled instances in semi-supervised learning and all three methods perform better than the original self-training and co-training methods, in most analysed cases.

Classification of acoustical signals by combining active learning strategies with semi-supervised learning schemes

Enter the proposed combinatory framework, which operates under training sets with small cardinality, the results prove the benefits of adopting such kind of semi-automated approaches regarding both the achieved predictive correctness when reduced consumption of resources takes place, as well as the smoothness of the learning convergence.

On tuning a mean-field model for semi-supervised classification

This work focuses on the task of transduction with a mean-field approximation to the Potts model and proposes a tuning approach based on a novel parameter γ that allows NMF to outperform other approaches in datasets with fewer classes.

Dealing With Multipositive Unlabeled Learning Combining Metric Learning and Deep Clustering

Experimental evaluations on real-world benchmarks considering recent MPUL competitors demonstrates that the proposed framework achieves state-of-the-art performances, thus supporting the validity of the proposed approach.

Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification

A (hierarchical) multi-label classification method based on semi-supervised learning of predictive clustering trees that preserves interpretability and reduces the time complexity of classical tree-based models is proposed.


  • Klym Yamkovyi
  • Computer Science
    Bulletin of National Technical University "KhPI". Series: System Analysis, Control and Information Technologies
  • 2021
It was shown, that even small amounts of labeled data allow us to use semi-supervised learning, and proposed modifications ensure to improve accuracy and algorithm performance, which was demonstrated during experiments.

A review of various semi-supervised learning models with a deep learning and memory approach

Memory-based neural networks are new models of neural networks which can be used in this area to benefit from memory to increase such an effect of semi-supervised learning.

Active learning for hierarchical multi-label classification

A public framework containing baseline and state-of-the-art algorithms suitable for hierarchical multi-label classification is provided, and a new algorithm, namely Hierarchical Query-By-Committee (H-QBC), which is validated on datasets from different domains.

Dash: Semi-Supervised Learning with Dynamic Thresholding

The proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection and its theoretical guarantee, and theoretically establishes the convergence rate of Dash from the view of non-convex optimization.



Introduction to Semi-Supervised Learning

This introductory book presents some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi- supervised support vector machines, and discusses their basic mathematical formulation.

SemiBoost: Boosting for Semi-Supervised Learning

A boosting framework for semi-supervised learning, termed as SemiBoost, that improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples and is comparable to the state-of-the-art semi- supervised learning algorithms.

Semi-Supervised Learning

This first comprehensive overview of semi-supervised learning presents state-of-the-art algorithms, a taxonomy of the field, selected applications, benchmark experiments, and perspectives on ongoing and future research.

Enhancing Supervised Learning with Unlabeled Data

A new semi-supervised learning method called co-learning that is designed to use unlabeled data to enhance standard supervised learning algorithms to leverage off the fact that they have different representations of the hypotheses and are likely to detect different patterns in labeled data.

Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions

  • Ke ChenShihai Wang
  • Computer Science
    IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2011
This paper proposes a novel cost functional consisting of the margin cost on labeled data and the regularization penalty on unlabeled data based on three fundamental semi-supervised assumptions and demonstrates that the algorithm yields favorite results for benchmark and real-world classification tasks in comparison to state-of-the-art semi- supervised learning algorithms, including newly developed boosting algorithms.

Semi-supervised learning with graphs

A series of novel semi-supervised learning approaches arising from a graph representation, where labeled and unlabeled instances are represented as vertices, and edges encode the similarity between instances are presented.

Graph-Based Semi-Supervised Learning

This synthesis lecture focuses on graph-based SSL algorithms (e.g., label propagation methods), which have been shown to outperform the state-of-the-art in many applications in speech processing, computer vision, natural language processing, and other areas of Artificial Intelligence.

Semi-supervised classification trees

A semi-supervised classification tree induction algorithm that can exploit both the labelled and unlabeled data, while preserving all of the appealing characteristics of standard supervised decision trees: being non-parametric, efficient, having good predictive performance and producing readily interpretable models.

Semi-Supervised Random Forests

This work develops a novel multi-class margin definition for the unlabeled data, and proposes a control mechanism based on the out-of-bag error, which prevents the algorithm from degradation if the unl labeled data is not useful for the task.

Semi-Supervised Regression with Co-Training

Experiments show that COREG can effectively exploit unlabeled data to improve regression estimates and is proposed as a co-training style semi-supervised regression algorithm.