Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study

  title={Learning Effective Embeddings From Crowdsourced Labels: An Educational Case Study},
  author={Guowei Xu and Wenbiao Ding and Jiliang Tang and Songfan Yang and Gale Yan Huang and Zitao Liu},
  journal={2019 IEEE 35th International Conference on Data Engineering (ICDE)},
  • Guowei Xu, Wenbiao Ding, +3 authors Zitao Liu
  • Published 8 April 2019
  • Computer Science, Mathematics
  • 2019 IEEE 35th International Conference on Data Engineering (ICDE)
Learning representation has been proven to be helpful in numerous machine learning tasks. [...] Key Method The proposed representation learning framework is evaluated in two real-world education applications. The experimental results demonstrate the benefits of our approach on learning representation from limited labeled data from the crowds, and show RLL is able to outperform state-of-the-art baselines. Moreover, detailed experiments are conducted on RLL to fully understand its key components and the…Expand
Representation Learning from Limited Educational Data with Crowdsourced Labels
A grouping based deep neural network is designed to learn embeddings from a limited number of training samples and a Bayesian confidence estimator is presented to capture the inconsistency among crowdsourced labels. Expand
Temporal-aware Language Representation Learning From Crowdsourced Labels
TACMA, a temporal-aware language representation learning heuristic for crowdsourced labels with multiple annotators, is proposed and shows that the approach outperforms a wide range of state-of-the-art baselines in terms of prediction accuracy and AUC. Expand
CrowdRL: An End-to-End Reinforcement Learning Framework for Data Labelling
CrowdRL is the first RL framework designed for the data labelling workflow by seamlessly integrating task selection, task assignment and truth inference together, and fully utilizes the power of heterogeneous annotators (experts and crowdsourcing workers) and machine learning models together to infer the truth, which highly improves the quality of datalabelling. Expand
Recent Advances in Multimodal Educational Data Mining in K-12 Education
This tutorial targets AI researchers and practitioners who are interested in applying state-of-the-art multimodal machine learning techniques to tackle some of the hard-core AIED tasks. Expand
Dolphin: A Spoken Language Proficiency Assessment System for Elementary Education
Dolphin, a spoken language proficiency assessment system for Chinese elementary education is developed and it is shown that Dolphin improves both phonological fluency and semantic relevance evaluation performance when compared to state-of-the-art baselines on real-world educational data sets. Expand
Classifying process deviations with weak supervision
The Snorkel framework is applied which uses a set of imperfect domain expert rules to classify the set of deviations into anomalies and exceptions, allowing auditors to do a full-population analysis of the identified deviations. Expand
NeuCrowd: Neural Sampling Network for Representation Learning with Crowdsourced Labels
The proposed framework creates a sufficient number of high-quality \emph{n}-tuplet training samples by utilizing safety-aware sampling and robust anchor generation; and automatically learns a neural sampling network that adaptively learns to select effective samples for SRL networks. Expand
Mathematical Word Problem Generation from Commonsense Knowledge Graph and Equations
An end-to-end neural model is developed to generate personalized and diverse MWPs in real-world scenarios from commonsense knowledge graph and equations and outperforms the state-of-the-art models in terms of both automatic evaluation metrics, i.e, equation relevance, topic relevance, and language coherence. Expand


Learning From Crowds
A probabilistic approach for supervised learning when the authors have multiple annotators providing (possibly noisy) labels but no absolute gold standard, and experimental results indicate that the proposed method is superior to the commonly used majority voting baseline. Expand
Learning Supervised Topic Models for Classification and Regression from Crowds
This article proposes two supervised topic models, one for classification and another for regression problems, which account for the heterogeneity and biases among different annotators that are encountered in practice when learning from crowds and develops an efficient stochastic variational inference algorithm. Expand
AggNet: Deep Learning From Crowds for Mitosis Detection in Breast Cancer Histology Images
An experimental study on learning from crowds that handles data aggregation directly as part of the learning process of the convolutional neural network (CNN) via additional crowdsourcing layer (AggNet), which gives valuable insights into the functionality of deep CNN learning from crowd annotations and proves the necessity of data aggregation integration. Expand
Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise
A probabilistic model is presented and it is demonstrated that the model outperforms the commonly used "Majority Vote" heuristic for inferring image labels, and is robust to both noisy and adversarial labelers. Expand
Data Programming: Creating Large Training Sets, Quickly
A paradigm for the programmatic creation of training sets called data programming is proposed in which users express weak supervision strategies or domain heuristics as labeling functions, which are programs that label subsets of the data, but that are noisy and may conflict. Expand
Gaussian Process Classification and Active Learning with Multiple Annotators
This paper generalizes GP classification in order to account for multiple annotators with different levels expertise, and empirically shows that the model significantly outperforms other commonly used approaches, such as majority voting, without a significant increase in the computational cost of approximate Bayesian inference. Expand
Learning to Compare: Relation Network for Few-Shot Learning
A conceptually simple, flexible, and general framework for few-shot learning, where a classifier must learn to recognise new classes given only few examples from each, which is easily extended to zero- shot learning. Expand
Who Said What: Modeling Individual Labelers Improves Classification
This work proposes modeling the experts individually and then learning averaging weights for combining them, possibly in sample-specific ways to give more weight to more reliable experts and take advantage of the unique strengths of individual experts at classifying certain types of data. Expand
Get another label? improving data quality and data mining using multiple, noisy labelers
The results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial. Expand
Representation Learning: A Review and New Perspectives
Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. Expand