Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers

  title={Influence-Driven Data Poisoning in Graph-Based Semi-Supervised Classifiers},
  author={Adriano Franci and Maxime Cordy and Martin Gubri and Mike Papadakis and Yves Le Traon},
  journal={2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN)},
  • Adriano FranciMaxime Cordy Yves Le Traon
  • Published 14 December 2020
  • Computer Science
  • 2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN)
Graph-based Semi-Supervised Learning (GSSL) is a practical solution to learn from a limited amount of labelled data together with a vast amount of unlabelled data. However, due to their reliance on the known labels to infer the unknown labels, these algorithms are sensitive to data quality. It is therefore essential to study the potential threats related to the labelled data, more specifically, label poisoning. In this paper, we propose a novel data poisoning method which efficiently… 

Figures and Tables from this paper

Rethinking Backdoor Data Poisoning Attacks in the Context of Semi-Supervised Learning

It is shown that simple poisoning attacks that follow the distribution of the poisoned samples’ predicted labels are highly effective - achieving an average attack success rate as high as 96 .



A Unified Framework for Data Poisoning Attack to Graph-based Semi-supervised Learning

This framework first unify different tasks, goals and constraints into a single formula for data poisoning attack in G-SSL, then it proposes two specialized algorithms to efficiently solve two important cases --- poisoning regression tasks under $\ell_2$-norm constraint and classification tasks under $ norm constraint.

Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning

It is proved that for basic hypothesis classes over the real line, if the distribution of unlabeled data is ‘smooth’, knowledge of that distribution cannot improve the labeled sample complexity by more than a constant factor.

Label Propagation for Deep Semi-Supervised Learning

This work employs a transductive label propagation method that is based on the manifold assumption to make predictions on the entire dataset and use these predictions to generate pseudo-labels for the unlabeled data and train a deep neural network.

Safe semi-supervised learning: a brief introduction

This article reviews some research progress of safe semi-supervised learning, focusing on three types of safeness issue: data quality, where the training data is risky or of low-quality; model uncertainty,Where the learning algorithm fails to handle the uncertainty during training; measure diversity, whereThe safe performance could be adapted to diverse measures.

Semi-Supervised Learning in Gigantic Image Collections

This paper uses the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images.

Label Sanitization against Label Flipping Poisoning Attacks

This paper proposes an efficient algorithm to perform optimal label flipping poisoning attacks and a mechanism to detect and relabel suspicious data points, mitigating the effect of such poisoning attacks.

Generalization Error Bounds Using Unlabeled Data

Two new methods for obtaining generalization error bounds in a semi-supervised setting based on approximating the disagreement probability of pairs of classifiers using unlabeled data are presented.

Learning from Labeled and Unlabeled Data

  • M. Seeger
  • Computer Science
    Encyclopedia of Machine Learning
  • 2010
A rigorous definition of the problem is given and the crucial role of prior knowledge is put forward, and the important notion of input-dependent regularization is discussed.

Learning from labeled and unlabeled data with label propagation

A simple iterative algorithm to propagate labels through the dataset along high density are as d fined by unlabeled data is proposed and its solution is analyzed, and its connection to several other algorithms is analyzed.

Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions

An approach to semi-supervised learning is proposed that is based on a Gaussian random field model, and methods to incorporate class priors and the predictions of classifiers obtained by supervised learning are discussed.