• Corpus ID: 235422386

Semi-supervised Active Regression

  title={Semi-supervised Active Regression},
  author={Fnu Devvrit and Nived Rajaraman and Pranjal Awasthi},
Labelled data often comes at a high cost as it may require recruiting human labelers or running costly experiments. At the same time, in many practical scenarios, one already has access to a partially labelled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of semi-supervised active learning through the frame of linear regression. In this setting, the learner has access to a datasetX ∈ R(n1+n2)×d which is… 



Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

This paper presents an empirical study of various semi-supervised learning techniques on a variety of datasets, and attempts to answer various questions such as the effect of independence or relevance amongst features, theeffect of the size of the labeled and unlabeled sets and the effects of noise.

A probabilistic approach towards an unbiased semi-supervised cluster tree

Active Learning Literature Survey

This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

Consistency-Based Semi-Supervised Active Learning: Towards Minimizing Labeling Cost

A consistency-based sample selection metric that is coherent with the training objective such that the selected samples are effective at improving model performance and a measure that is empirically correlated with the AL target loss and is potentially useful for determining the proper starting point of learning-based AL methods.

Active learning with support vector machines

Different query strategies for selecting informative data points are discussed and how these strategies give rise to different variants of active learning with SVMs are reviewed.

Optimal Deterministic Coresets for Ridge Regression

The ridge regression problem is considered, and a deterministic protocol for ridge regression with O(sdλ/ ) words of communication per server in a distributed setting is given, in the important case when the rows of A and B have a constant number of non-zero entries.

Fairness in Semi-Supervised Learning: Unlabeled Data Help to Reduce Discrimination

A framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data, a re-sampling method to obtain multiple fair datasets and lastly, ensemble learning to improve accuracy and decrease discrimination are presented.

Active Learning for Convolutional Neural Networks: A Core-Set Approach

This work defines the problem of active learning as core-set selection as choosing set of points such that a model learned over the selected subset is competitive for the remaining data points, and presents a theoretical result characterizing the performance of any selected subset using the geometry of the datapoints.

Learning classifiers from only positive and unlabeled data

This paper shows that models trained using the new methods perform better than the current state-of-the-art biased SVM method for learning from positive and unlabeled examples, and applies them to solve a real-world problem: identifying protein records that should be included in an incomplete specialized molecular biology database.

Two faces of active learning