• Corpus ID: 230433898

Synthetic Embedding-based Data Generation Methods for Student Performance

  title={Synthetic Embedding-based Data Generation Methods for Student Performance},
  author={Dom Huh},
  • Dom Huh
  • Published 3 January 2021
  • Computer Science
  • ArXiv
In this work, we introduce a framework for synthetic data generation for academic performance prediction formulations. A common problem in these academic performance prediction dataset is that outcomes/grades are not distributed evenly, leading to class imbalance. This poses a challenge for predictive machine learning algorithms to learn important characteristics at the edges of the target class distribution. We present a general framework for synthetic embeddingbased data generation (SEDG), a… 



ADASYN: Adaptive synthetic sampling approach for imbalanced learning

Simulation analyses on several machine learning data sets show the effectiveness of the ADASYN sampling approach across five evaluation metrics.

Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach

The approach does not sacrifice one class in favor of the other, but produces high predictions against both minority and majority classes, and compares well in comparison with a base classifier, a standard benchmarking boosting algorithm and three advanced boosting-based algorithms for imbalanced data set.

Generative Adversarial Minority Oversampling

This work proposes a three-player adversarial game between a convex generator, a multi-class classifier network, and a real/fake discriminator to perform oversampling in deep learning systems.

The balancing trick: Optimized sampling of imbalanced datasets—A brief survey of the recent State of the Art

A plethora of conventional and recent techniques that address the problem of imbalanced class distribution through intelligent representations of samples from the majority and minority classes, that are given as input to the learning module are surveyed.

Recent Trends in Deep Generative Models: a Review

  • C. G. TurhanH. Ş. Bilge
  • Computer Science
    2018 3rd International Conference on Computer Science and Engineering (UBMK)
  • 2018
A comprehensive review ofGenerative models with defining relations among them is presented for a better understanding of GANs and AEs by pointing the importance of generative models.

SMOTE: Synthetic Minority Over-sampling Technique

A combination of the method of oversampling the minority (abnormal) class and under-sampling the majority class can achieve better classifier performance (in ROC space) and a combination of these methods and the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy is evaluated.

Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives

Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, manifold learning, anddeep learning.

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

A "budget-sensitive" progressive sampling algorithm is introduced for selecting training examples based on the class associated with each example and it is shown that the class distribution of the resulting training set yields classifiers with good (nearly-optimal) classification performance.

Classification of Imbalanced Data: a Review

This paper provides a review of the classification of imbalanced data regarding the application domains, the nature of the problem, the learning difficulties with standard classifier learning algorithms; the learning objectives and evaluation measures; the reported research solutions; and the class imbalance problem in the presence of multiple classes.