Accelerating Extreme Classification via Adaptive Feature Agglomeration

@article{Jalan2019AcceleratingEC,
  title={Accelerating Extreme Classification via Adaptive Feature Agglomeration},
  author={Ankit Jalan and Purushottam Kar},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.11769}
}
Extreme classification seeks to assign each data point, the most relevant labels from a universe of a million or more labels. This task is faced with the dual challenge of high precision and scalability, with millisecond level prediction times being a benchmark. We propose DEFRAG, an adaptive feature agglomeration technique to accelerate extreme classification algorithms. Despite past works on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of… 

ECLARE: Extreme Classification with Label Graph Correlations

TLDR
This paper presents ECLARE, a scalable deep learning architecture that incorporates not only label text, but also label correlations, to offer accurate real-time predictions within a few milliseconds.

Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels

TLDR
This work considers common loss functions that decompose over labels, and proposes to switch to convex surrogates of the 0-1 loss, and calculates unbiased estimates that compensate missing labels according to Natarajan et al.

Multilabel Classification by Hierarchical Partitioning and Data-dependent Grouping

TLDR
This paper presents a novel data-dependent grouping approach that uses a group construction based on a low-rank Nonnegative Matrix Factorization of the label matrix of training instances to solve the classification problem in a much lower dimensional space and then obtain labels in the original space using an appropriately defined lifting.

Matching Neural Network for Extreme Multi-Label Learning

TLDR
The Matching Neural Network (MNN), which learns two neural mapping functions that encode feature vectors and label vectors into their distributed representations, respectively, and a noise contrastive loss is proposed to guide the training of the functions so as to ensure matched features and labels have similar distributed representation measured by cosine similarity.

Generalized Zero-Shot Extreme Multi-label Learning

TLDR
This paper proposes a novel approach called ZestXML for the task of Generalized Zero-shot XML (GZXML) where relevant labels have to be chosen from all available seen and unseen labels and learns to project a data point's features close to the features of its relevant labels through a highly sparsified linear transform.

The Emerging Trends of Multi-Label Learning

TLDR
There has been a lack of systemic studies that focus explicitly on analyzing the emerging trends and new challenges of multi-label learning in the era of big data, and it is imperative to call for a comprehensive survey to fulfill this mission and delineate future research directions and new applications.

Bonsai: diverse and shallow trees for extreme multi-label classification

TLDR
A suite of algorithms, called Bonsai, is developed, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees, and achieves the best of both worlds.

Towards multi-label classification: Next step of machine learning for microbiome research

Learning with Holographic Reduced Representations

TLDR
Using multi-label classification it is demonstrated how to leverage the symbolic HRR properties to develop an output layer and loss function that is able to learn effectively, and allows us to investigate some of the pros and cons of an HRR neuro-symbolic learning approach.

References

SHOWING 1-10 OF 20 REFERENCES

FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning

TLDR
The objective, in this paper, is to develop an extreme multi-label classifier that is faster to train and more accurate at prediction than the state-of-the-art Multi-label Random Forest algorithm and the Label Partitioning for Sub-linear Ranking algorithm.

CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning

TLDR
A new random forest based algorithm with a very fast partitioning approach called CRAFTML is introduced which outperforms the other tree-based approaches and is competitive with the best state-of-the-art methods which run on one hundred-core machines.

PPDsparse: A Parallel Primal-Dual Sparse Method for Extreme Classification

TLDR
This work extends PD-Sparse to be efficiently parallelized in large-scale distributed settings, and introduces separable loss functions that can scale out the training, with network communication and space efficiency comparable to those in one-versus-all approaches while maintaining an overall complexity sub-linear in the number of classes.

Data scarcity, robustness and extreme multi-label classification

TLDR
It is shown that minimizing Hamming loss with appropriate regularization surpasses many state-of-the-art methods for tail-labels detection in XMC and the spectral properties of label graphs are investigated for providing novel insights towards understanding the conditions governing the performance of Hamming losses based one-vs-rest scheme.

DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification

TLDR
This work presents DiSMEC, which is a large-scale distributed framework for learning one-versus-rest linear classifiers coupled with explicit capacity control to control model size, and conducts extensive empirical evaluation on publicly available real-world datasets consisting upto 670,000 labels.

Sparse Local Embeddings for Extreme Multi-label Classification

TLDR
The SLEEC classifier is developed for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels and can make significantly more accurate predictions then state-of-the-art methods including both embedding-based as well as tree-based methods.

Large-scale Multi-label Learning with Missing Labels

TLDR
This paper studies the multi-label problem in a generic empirical risk minimization (ERM) framework and develops techniques that exploit the structure of specific loss functions - such as the squared loss function - to obtain efficient algorithms.

Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications

The choice of the loss function is critical in extreme multi-label learning where the objective is to annotate each data point with the most relevant subset of labels from an extremely large label

AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification

TLDR
Experimental results show that the novel graph embedding method called AnnexML can significantly improve prediction accuracy, especially on data sets that have larger a label space, and improves the trade-off between prediction time and accuracy.

Balanced Clustering with Least Square Regression

TLDR
This paper proposes a novel and simple method for clustering, referred to as the Balanced Clustering with Least Square regression (BCLS), to minimize the least square linear regression, with a balance constraint to regularize the clustering model.