On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification

@article{Schultheis2022OnML,
  title={On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification},
  author={Erik Schultheis and Marek Wydmuch and Rohit Babbar and Krzysztof Dembczy'nski},
  journal={Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  year={2022}
}
The propensity model introduced by Jain et al has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC). In this paper, we critically revise this approach showing that despite its theoretical soundness, its application in contemporary XMLC works is debatable. We exhaustively discuss the flaws of the propensity-based approach, and present several recipes, some of them related to solutions used in search engines and recommender… 

Figures and Tables from this paper

References

SHOWING 1-10 OF 43 REFERENCES

Propensity-scored Probabilistic Label Trees

TLDR
This work introduces an inference procedure, based on the A*-search algorithm, that efficiently finds the optimal solution to the problem of optimal predictions under this model for probabilistic label trees, a popular approach for XMLC problems.

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications

The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label…

Unbiased Loss Functions for Multilabel Classification with Missing Labels

TLDR
This paper derives the unique unbiased estimators for the different multilabel reductions, including the non-decomposable ones, which suffer from increased variance and may lead to ill-posed optimization problems, which are addressed by switching to convex upper-bounds.

Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

TLDR
This work considers common loss functions that decompose over labels, and proposes to switch to convex surrogates of the 0-1 loss, and calculates unbiased estimates that compensate missing labels according to Natarajan et al.

Does Tail Label Help for Large-Scale Multi-Label Learning

TLDR
A low-complexity large-scale multi-label learning algorithm is developed with the goal of facilitating fast prediction and compact models by trimming tail labels adaptively without sacrificing much predictive performance for state-of-the-art approaches.

Taming Pretrained Transformers for Extreme Multi-label Text Classification

TLDR
X-Transformer is proposed, the first scalable approach to fine-tuning deep transformer models for the XMC problem and achieves new state-of-the-art results on four XMC benchmark datasets.

Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages

TLDR
It is demonstrated that it is possible to efficiently predict the relevant subset of queries from a large set of monetizable ones by posing the problem as a multi-label learning task with each query being represented by a separate label.

Data scarcity, robustness and extreme multi-label classification

TLDR
It is shown that minimizing Hamming loss with appropriate regularization surpasses many state-of-the-art methods for tail-labels detection in XMC and the spectral properties of label graphs are investigated for providing novel insights towards understanding the conditions governing the performance of Hamming losses based one-vs-rest scheme.

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

TLDR
This work proposes a new label tree-based deep learning model for XMTC, called AttentionXML, with two unique features: a multi-label attention mechanism with raw text as input, which allows to capture the most relevant part of text to each label; and a shallow and wide probabilistic label tree (PLT), which allow to handle millions of labels, especially for "tail labels".

Bonsai: diverse and shallow trees for extreme multi-label classification

TLDR
A suite of algorithms, called Bonsai, is developed, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees, and achieves the best of both worlds.