Local Contrastive Feature Learning for Tabular Data

  title={Local Contrastive Feature Learning for Tabular Data},
  author={Zhabiz Gharibshah and Xingquan Zhu},
  journal={Proceedings of the 31st ACM International Conference on Information \& Knowledge Management},
  • Zhabiz GharibshahXingquan Zhu
  • Published 17 October 2022
  • Computer Science
  • Proceedings of the 31st ACM International Conference on Information & Knowledge Management
Contrastive self-supervised learning has been successfully used in many domains, such as images, texts, graphs, etc., to learn features without requiring label information. In this paper, we propose a new local contrastive feature learning (LoCL) framework, and our theme is to learn local patterns/features from tabular data. In order to create a niche for local learning, we use feature correlations to create a maximum-spanning tree, and break the tree into feature subsets, with strongly… 

Figures and Tables from this paper



Barlow Twins: Self-Supervised Learning via Redundancy Reduction

This work proposes an objective function that naturally avoids collapse by measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible.

VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain

This paper creates a novel pretext task of estimating mask vectors from corrupted tabular data in addition to the reconstruction pretext task for self-supervised learning, and introduces a noveltabular data augmentation method for selfand semi- supervised learning frameworks.

Deep Learning for User Interest and Response Prediction in Online Display Advertising

Experiments and comparisons on real-world data show that, compared to existing static set-based approaches, considering sequences and temporal variance of user requests results in improvements in user Ad response prediction and campaign specific user Ad click prediction.

Anomaly Detection for Tabular Data with Internal Contrastive Learning

Why do tree-based models still outperform deep learning on tabular data?

Results show that tree-based models remain state-of-the-art on medium-sized data even without accounting for their superior speed, and leads to a series of challenges which should guide researchers aiming to build tabular-species NNs.

Data-Efficient and Interpretable Tabular Anomaly Detection

A novel AD framework is proposed that adapts a white-box model class, Generalized Additive Models, to detect anomalies using a partial identification objective which naturally handles noisy or heterogeneous features and can incorporate a small amount of labeled data to further boost anomaly detection performances in semisupervised settings.

SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning

This paper introduces a new framework, Subsetting features of Tabular data (SubTab), that turns the task of learning from tabular data into a multi-view representation learning problem by dividing the input features to multiple subsets and argues that reconstructing the data from the subset of its features rather than its corrupted version in an autoencoder setting can better capture its underlying latent representation.

Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain

This work introduces Contrastive Mixup, a semisupervised learning framework for tabular data and demonstrates its effectiveness in limited annotated data settings.

Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking

This work proposes a method based on contrastive learning, which takes into account the possible variations in user's behavior sequences, and proposes three data augmentation strategies to generate similar variants of user behavior sequences and contrast them with other sequences.

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption

S CARF is proposed, a simple, widely-applicable technique for contrastive learning, where views are formed by corrupting a random subset of features, that complements existing strategies and outperforms alternatives like autoencoders.