• Corpus ID: 220281081

Unbiased Loss Functions for Extreme Classification With Missing Labels

  title={Unbiased Loss Functions for Extreme Classification With Missing Labels},
  author={Erik Schultheis and Mohammadreza Qaraei and Priyanshu Gupta and Rohit Babbar},
The goal in extreme multi-label classification (XMC) is to tag an instance with a small subset of relevant labels from an extremely large set of possible labels. In addition to the computational burden arising from large number of training instances, features and labels, problems in XMC are faced with two statistical challenges, (i) large number of 'tail-labels' -- those which occur very infrequently, and (ii) missing labels as it is virtually impossible to manually assign every relevant label… 

Figures and Tables from this paper

Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance
This paper develops a framework for online evaluation based on adaptive importance sampling that establishes strong consistency and a central limit theorem for the resulting performance estimates, and instantiate the framework with worked examples that leverage Dirichlet-tree models.
Prediction in the Presence of Response-Dependent Missing Labels
This work develops a new methodology and non-convex algorithm P(ositive) U(nlabeled) O(ccurrence) M(agnitude) M (ixture) which jointly estimates the occurrence and detection likelihood of positive samples, utilizing prior knowledge of the detection mechanism.


Data scarcity, robustness and extreme multi-label classification
It is shown that minimizing Hamming loss with appropriate regularization surpasses many state-of-the-art methods for tail-labels detection in XMC and the spectral properties of label graphs are investigated for providing novel insights towards understanding the conditions governing the performance of Hamming losses based one-vs-rest scheme.
Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications
The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label
Large-scale Multi-label Learning with Missing Labels
This paper studies the multi-label problem in a generic empirical risk minimization (ERM) framework and develops techniques that exploit the structure of specific loss functions - such as the squared loss function - to obtain efficient algorithms.
DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification
This work presents DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size, and conducts extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels.
Sparse Local Embeddings for Extreme Multi-label Classification
The SLEEC classifier is developed for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels and can make significantly more accurate predictions then state-of-the-art methods including both embedding-based as well as tree-based methods.
Does Tail Label Help for Large-Scale Multi-Label Learning
A low-complexity large-scale multi-label learning algorithm is developed with the goal of facilitating fast prediction and compact models by trimming tail labels adaptively without sacrificing much predictive performance for state-of-the-art approaches.
A no-regret generalization of hierarchical softmax to extreme multi-label classification
It is shown that PLTs are a no-regret multi-label generalization of HSM when precision@$k$ is used as a model evaluation metric, and it is proved that pick-one-label heuristic---a reduction technique from multi- label to multi-class that is routinely used along with HSM---is not consistent in general.
PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification
A Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm is proposed that exploits both Primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables and achieves significant higher accuracy than existing approaches of Extreme Classification.
Bonsai: diverse and shallow trees for extreme multi-label classification
A suite of algorithms, called Bonsai, is developed, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees, and achieves the best of both worlds.
Stochastic Negative Mining for Learning with Large Output Spaces
This work defines a family of surrogate losses and shows that they are calibrated and convex under certain conditions on the loss parameters and data distribution, thereby establishing a statistical and analytical basis for using these losses.