Density-based weighting for imbalanced regression

@article{Steininger2021DensitybasedWF,
  title={Density-based weighting for imbalanced regression},
  author={Michael Steininger and Konstantin Kobs and Padraig Davidson and Anna Krause and Andreas Hotho},
  journal={Machine Learning},
  year={2021},
  volume={110},
  pages={2187 - 2211}
}
In many real world settings, imbalanced data impedes model performance of learning algorithms, like neural networks, mostly for rare cases. This is especially problematic for tasks focusing on these rare occurrences. For example, when estimating precipitation, extreme rainfall events are scarce but important considering their potential consequences. While there are numerous well studied solutions for classification settings, most of them cannot be applied to regression easily. Of the few… 

RankSim: Ranking Similarity Regularization for Deep Imbalanced Regression

RankSim is complementary to conventional imbalanced learning techniques, including re-weighting, two-stage training, and distribution smoothing, and lifts the state-of-the-art performance on three imbalanced regression benchmarks: IMDB-WIKI-dir, AgeDB-DIR, and STS-B-DIR.

Balanced MSE for Imbalanced Visual Regression

This work identifies that the widely used Mean Square Error (MSE) loss function can be ineffective in imbalanced regression and proposes a novel loss function, Balanced MSE, to accommodate the imbalanced training label distribution.

Anomaly Detection using Contrastive Normalizing Flows

This work proposes to use an unlabelled auxiliary dataset and a probabilistic outlier score for anomaly detection and believes that the contrastive normalizing flow can be used for various applications outside of anomaly detection.

Comparing Multiple Linear Regression, Deep Learning and Multiple Perceptron for Functional Points Estimation

Both the Pytorch-based Deep Learning and Multiple Perceptron model outperformed Multiple Linear Regression and baseline models using the experimental dataset and in the studied dataset, Adjusted Function Points may not contribute to higher accuracy than Function Point Categories.

Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

A two-stage fine-tuning is proposed: first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then the standard fine- Tuning is performed, which allows the model to learn an initial representation of the specific task.

Taming the Long Tail of Deep Probabilistic Forecasting

This work identifies a long tail behavior in the performance of state-of-the-art deep learning methods on probabilistic forecasting and presents two moment-based tailedness measurement concepts to improve performance on the difficult tail examples: Pareto Loss and Kurtosis Loss.

Affective Retrofitted Word Embeddings

A novel retrofitting method that learns a non-linear transformation function that maps pre-trained embeddings to an affective vector space, in a representation learning setting, which achieves better inter-clusters and intra-cluster distance for words having the same emotions, as evaluated through different cluster quality metrics.

Affective Retrofitted Word Embeddings

A novel retrofitting method that learns a non-linear transformation function that maps pre-trained embeddings to an affective vector space, in a representation learning setting, which achieves better inter-clusters and intra-cluster distance for words having the same emotions, as evaluated through different cluster quality metrics.

An Adaptive Sampling Framework for Life Cycle Degradation Monitoring

An adaptive sampling framework of segment intervals is proposed, based on the summary and improvement of existing problems, to monitor mechanical degradation, and the results are closely related to data status and degradation indicators.

Variation-based Cause Effect Identification

A variation-based cause effect identification framework for causal discovery in bivariate systems from a single observational setting, which relies on the principle of independence of cause and mechanism (ICM) under the assumption of an existing acyclic causal link, and offers a practical realization of this principle.

References

SHOWING 1-10 OF 31 REFERENCES

Imbalanced regression and extreme value prediction

This paper proposes SERA, a new evaluation metric capable of assessing the effectiveness and of optimising models towards the prediction of extreme values while penalising severe model bias.

SMOGN: a Pre-processing Approach for Imbalanced Regression

The algorithm, SMOGN, is proposed, which incorporates two existing proposals trying to solve problems detected in both of them and has advantages in comparison to other approaches, and is shown to have a different impact on the learners used.

SMOTE for Regression

A modification of the well-known Smote algorithm that allows its use on these regression tasks by changing the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones.

Kernel density estimation based sampling for imbalanced class distribution

A Survey of Predictive Modeling on Imbalanced Domains

The main challenges raised by imbalanced domains are discussed, a definition of the problem is proposed, the main approaches to these tasks are described, and a taxonomy of the methods are proposed.

Learning Deep Representation for Imbalanced Classification

The representation learned by this approach, when combined with a simple k-nearest neighbor (kNN) algorithm, shows significant improvements over existing methods on both high- and low-level vision classification tasks that exhibit imbalanced class distribution.

Class-Balanced Loss Based on Effective Number of Samples

This work designs a re-weighting scheme that uses the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss and introduces a novel theoretical framework to measure data overlap by associating with each sample a small neighboring region rather than a single point.

Learning from imbalanced data: open challenges and future directions

  • B. Krawczyk
  • Computer Science
    Progress in Artificial Intelligence
  • 2016
Seven vital areas of research in this topic are identified, covering the full spectrum of learning from imbalanced data: classification, regression, clustering, data streams, big data analytics and applications, e.g., in social media and computer vision.

ADASYN: Adaptive synthetic sampling approach for imbalanced learning

Simulation analyses on several machine learning data sets show the effectiveness of the ADASYN sampling approach across five evaluation metrics.