Job Offers Classifier using Neural Networks and Oversampling Methods

  title={Job Offers Classifier using Neural Networks and Oversampling Methods},
  author={Germ{\'a}n Ortiz and Gemma Bel Enguix and Helena G{\'o}mez-Adorno and Iqra Ameer and Grigori Sidorov},
. Both policy and research benefit from a better understand-ing of individuals’ jobs. However, as large-scale administrative records are increasingly employed to represent labor market activity, new automatic methods to classify jobs will become necessary. We developed an automatic job offers classifier using a dataset collected from the largest job bank of Mexico known as Bumeran 5 . We applied machine learning algorithms such as Support Vector Machines, Naive-Bayes, Logistic Regression, Random… 

Figures and Tables from this paper



LSTM Recurrent Neural Networks for Short Text and Sentiment Classification

A demonstration of how to classify text using Long Term Term Memory (LSTM) network and their modifications, i.e. Bidirectional LSTM network and Gated Recurrent Unit, and the superiority of this method over other algorithms for text classification is presented.

Handling imbalanced data in churn prediction using ADASYN and backpropagation algorithm

The backpropagation algorithm is chosen as classification model and ADASYN (Adaptive Synthetic Sampling) is one of the oversampling methods that can be used to solve the class imbalanced problem.

Imbalanced Learning in Land Cover Classification: Improving Minority Classes' Prediction Accuracy Using the Geometric SMOTE Algorithm

This paper proposes Geometric-SMOTE, a novel oversampling method, as a tool for addressing the imbalanced learning problem in remote sensing and indicates that, when using imbalanced datasets, remote sensing researchers should consider the use of these new generation oversamplers to increase the quality of the classification results.

SMOTE: Synthetic Minority Over-sampling Technique

A combination of the method of oversampling the minority (abnormal) class and under-sampling the majority class can achieve better classifier performance (in ROC space) and a combination of these methods and the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy is evaluated.

Recurrent Convolutional Neural Networks for Text Classification

A recurrent convolutional neural network is introduced for text classification without human-designed features to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks.

Learning From Imbalanced Data

This chapter aims to address the need, challenges, existing methods, and evaluation metrics identified when learning from imbalanced data sets.

Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions

A novel machine learning framework based on recursive autoencoders for sentence-level prediction of sentiment label distributions that outperform other state-of-the-art approaches on commonly used datasets, without using any pre-defined sentiment lexica or polarity shifting rules.

Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE

This paper proposes Geometric SMOTE (G-SMOTE) as a generalization of the SMOTE data generation mechanism, and presents empirical results that show a significant improvement in the quality of the generated data when G- SMOTE is used as an oversampling algorithm.

CRAN: A Hybrid CNN-RNN Attention-Based Model for Text Classification

This paper proposes a hybrid CNN-RNN attention-based neural network, named CRAN, which combines the convolutional neural network and recurrent neural network effectively with the help of the attention mechanism, and proves its effectiveness and efficiency.

Improving Imbalanced Question Classification Using Structured Smote Based Approach

The proposed framework is grammar-based, which involves using the grammatical pattern for each question and using machine learning algorithms to classify them, which demonstrates a good level of accuracy in identifying different question types and handling class imbalance.