Integrated Weak Learning

  title={Integrated Weak Learning},
  author={Peter Hayes and Mingtian Zhang and Raza Habib and Jordan Burgess and Emine Yilmaz and David Barber},
We introduce Integrated Weak Learning , a principled framework that integrates weak supervision into the training process of machine learning models. Our approach jointly trains the end-model and a label model that aggregates multiple sources of weak supervision. We introduce a label model that can learn to ag-gregate weak supervision sources differently for different datapoints and takes into consideration the performance of the end-model during training. We show that our approach outperforms… 

Figures and Tables from this paper



Training Complex Models with Multi-Task Weak Supervision

This work shows that by solving a matrix completion-style problem, it can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model.

Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

FlyingSquid is built, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions, and proves bounds on generalization error without assuming that the latent variable model can exactly parameterize the underlying data distribution.

WRENCH: A Comprehensive Benchmark for Weak Supervision

A benchmark platform, WRENCH, for thorough and standardized evaluation of WS approaches, consisting of 22 varied real-world datasets for classification and sequence tagging; a range of real, synthetic, and procedurally-generated weak supervision sources; and a modular, extensible framework for WS evaluation, including implementations for popular WS methods.

Snorkel: rapid training data creation with weak supervision

Snorkel is a first-of-its-kind system that enables users to train state-of theart models without hand labeling any training data by incorporating the first end-to-end implementation of the recently proposed machine learning paradigm, data programming.

Denoising Multi-Source Weak Supervision for Neural Text Classification

A label denoiser is designed, which estimates the source reliability using a conditional soft attention mechanism and then reduces label noise by aggregating rule-annotated weak labels, which address the rule coverage issue.

BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision

A new computational framework -- BOND, which leverages the power of pre-trained language models to improve the prediction performance of NER models and demonstrates the superiority of BOND over existing distantly supervised NER methods.

Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning

This paper proposes a novel approach which can partially solve the above problems of distant supervision for NER, and applies partial annotation learning to reduce the effect of unknown labels of characters in incomplete and noisy annotations.

Learning From Incomplete and Inaccurate Supervision

This paper investigates the problem of learning from incomplete and inaccurate supervision, where only a limited subset of training data is labeled but potentially with noise and proposes novel approaches that effectively alleviate the negative influence of label noise with the help of a vast number of unlabeled data.

Learning to Reweight Examples for Robust Deep Learning

This work proposes a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions that can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.