Knodle: Modular Weakly Supervised Learning with PyTorch

  title={Knodle: Modular Weakly Supervised Learning with PyTorch},
  author={Anastasiia Sedova and Andreas Stephan and M. Speranskaya and Benjamin Roth},
Strategies for improving the training and prediction quality of weakly supervised machine learning models vary in how much they are tailored to a specific task or integrated with a specific model architecture. In this work, we introduce Knodle, a software framework that treats weak data annotations, deep learning models, and methods for improving weakly supervised training as separate, modular components. This modularization gives the training process access to fine-grained information such as… 

Figures and Tables from this paper

KnowMAN: Weakly Supervised Multinomial Adversarial Networks
KnowMAN is proposed, an adversarial scheme that enables to control influence of signals associated with specific labeling functions and forces the network to learn representations that are invariant to those signals and to pick up other signals that are more generally associated with an output label.


Snorkel: Rapid Training Data Creation with Weak Supervision
Snorkel is a first-of-its-kind system that enables users to train state- of- the-art models without hand labeling any training data and proposes an optimizer for automating tradeoff decisions that gives up to 1.8× speedup per pipeline execution.
Training Convolutional Networks with Noisy Labels
An extra noise layer is introduced into the network which adapts the network outputs to match the noisy label distribution and can be estimated as part of the training process and involve simple modifications to current training infrastructures for deep networks.
Confident Learning: Estimating Uncertainty in Dataset Labels
This work combines building on the assumption of a classification noise process to directly estimate the joint distribution between noisy (given) labels and uncorrupted (unknown) labels, resulting in a generalized CL which is provably consistent and experimentally performant.
CrossWeigh: Training Named Entity Tagger from Imperfect Annotations
This study dives deep into one of the widely-adopted NER benchmark datasets, CoNLL03 NER, and proposes a simple yet effective framework, CrossWeigh, to handle label mistakes during NER model training.
Neural Relation Extraction with Selective Attention over Instances
A sentence-level attention-based model for relation extraction that employs convolutional neural networks to embed the semantics of sentences and dynamically reduce the weights of those noisy instances.
Learning Whom to Trust with MACE
MACE (Multi-Annotator Competence Estimation) learns in an unsupervised fashion to identify which annotators are trustworthy and predict the correct underlying labels, and shows considerable improvements over standard baselines, both for predicted label accuracy and trustworthiness estimates.
Distant supervision for relation extraction without labeled data
This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.
Analysing the Noise Model Error for Realistic Noisy Label Data
The quality of these estimated noise models are studied from the theoretical side by deriving the expected error of the noise model and a new noisy label dataset from the NLP domain is published that was obtained through a realistic distant supervision technique.
Distant supervision
  • 2021
Are Noisy Sentences Useless for Distant Supervised Relation Extraction?
A novel method for distant supervised relation extraction is proposed, which employs unsupervised deep clustering to generate reliable labels for noisy sentences, which outperforms the state-of-the-art baselines on a popular benchmark dataset, and can indeed alleviate the noisy labeling problem.