Uncertainty in Extreme Multi-label Classification

  title={Uncertainty in Extreme Multi-label Classification},
  author={Jyun-Yu Jiang and Wei-Cheng Chang and Jiong Zhong and Cho-Jui Hsieh and Hsiang-Fu Yu},
Uncertainty quantification is one of the most crucial tasks to obtain trustworthy and reliable machine learning models for decision making. However, most research in this domain has only focused on problems with small label spaces and ignored eXtreme Multi-label Classification (XMC), which is an essential task in the era of big data for web-scale machine learning applications. Moreover, enormous label spaces could also lead to noisy retrieval results and intractable computational challenges for… 

Figures and Tables from this paper



A no-regret generalization of hierarchical softmax to extreme multi-label classification

It is shown that PLTs are a no-regret multi-label generalization of HSM when precision@$k$ is used as a model evaluation metric, and it is proved that pick-one-label heuristic---a reduction technique from multi- label to multi-class that is routinely used along with HSM---is not consistent in general.

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications

The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label

Can multi-label classification networks know what they don't know?

JointEnergy, a simple and effective method, which estimates the OOD indicator scores by aggregating label-wise energy scores from multiple labels, and can be mathematically interpreted from a joint likelihood perspective, is proposed.

Uncertainty in Gradient Boosting via Ensembles

Experiments on a range of regression and classification datasets show that ensembles of gradient boosting models yield improved predictive performance, and measures of uncertainty successfully enable detection of out-of-domain inputs.

Bonsai: diverse and shallow trees for extreme multi-label classification

A suite of algorithms, called Bonsai, is developed, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees, and achieves the best of both worlds.

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

This work presents DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size, and conducts extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels.

Taming Pretrained Transformers for Extreme Multi-label Text Classification

X-Transformer is proposed, the first scalable approach to fine-tuning deep transformer models for the XMC problem and achieves new state-of-the-art results on four XMC benchmark datasets.

AttentionXML: Label Tree-based Attention-Aware Deep Model for High-Performance Extreme Multi-Label Text Classification

This work proposes a new label tree-based deep learning model for XMTC, called AttentionXML, with two unique features: a multi-label attention mechanism with raw text as input, which allows to capture the most relevant part of text to each label; and a shallow and wide probabilistic label tree (PLT), which allow to handle millions of labels, especially for "tail labels".

Deep Learning for Extreme Multi-label Text Classification

This paper presents the first attempt at applying deep learning to XMTC, with a family of new Convolutional Neural Network models which are tailored for multi-label classification in particular.

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

A novel recursive approach, XR-Transformer is proposed to accelerate the procedure through recursivelytuning transformer models on a series of multi-resolution objectives related to the original XMC objective function, taking significantly less training time compared to other transformer-based XMC models while yielding better state-of-the-art results.