CatBoost: gradient boosting with categorical features support
@article{Dorogush2018CatBoostGB, title={CatBoost: gradient boosting with categorical features support}, author={Anna Veronika Dorogush and Vasily Ershov and Andrey Gulin}, journal={ArXiv}, year={2018}, volume={abs/1810.11363} }
In this paper we present CatBoost, a new open-sourced gradient boosting library that successfully handles categorical features and outperforms existing publicly available implementations of gradient boosting in terms of quality on a set of popular publicly available datasets. The library has a GPU implementation of learning algorithm and a CPU implementation of scoring algorithm, which are significantly faster than other gradient boosting libraries on ensembles of similar sizes.
607 Citations
Factorized MultiClass Boosting
- Computer ScienceArXiv
- 2019
A new approach to multiclass classification problem that decomposes the problem into a series of regression tasks, that are solved with CART trees, allowing to reach high-quality results in significantly less time without class re-balancing.
MP-Boost: Minipatch Boosting via Adaptive Feature and Observation Sampling
- Computer Science2021 IEEE International Conference on Big Data and Smart Computing (BigComp)
- 2021
Boosting methods are among the best generalpurpose and off-the-shelf machine learning approaches, gaining widespread popularity. In this paper, we seek to develop a boosting method that yields…
Wide Boosting
- Computer ScienceArXiv
- 2020
This paper presents a simple adjustment to GB that allows the output of a GB model to have increased dimension prior to being fed into the loss and is thus "wider" than standard GB implementations.
Multi-Target XGBoostLSS Regression
- Computer Science
- 2022
An exten- sion of XGBoostLSS is presented that models multiple targets and their dependencies in a probabilistic regression setting that outperforms existing GBMs with respect to runtime and compares well in terms of accuracy.
Competitive Analysis of the Top Gradient Boosting Machine Learning Algorithms
- Computer Science2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN)
- 2020
This research performs an exhaustive 360 degree comparative analysis of each of the four state-of-the-art gradient boosting algorithms viz.
Challenges and Opportunities of Building Fast GBDT Systems
- Computer ScienceIJCAI
- 2021
This survey paper reviews the recent GBDT systems with respect to accelerations with emerging hardware as well as cluster computing, and compares the advantages and disadvantages of the existing implementations.
Gradient Boosting Machine with Partially Randomized Decision Trees
- Computer Science2021 28th Conference of Open Innovations Association (FRUCT)
- 2021
This work proposes to apply the partially randomized trees which can be regarded as a special case of the extremely randomized trees applied to the gradient boosting machine to reduce the computational complexity of the gradient boost machine.
Hyperboost: Hyperparameter Optimization by Gradient Boosting surrogate models
- Computer ScienceArXiv
- 2021
This paper proposes a new surrogate model based on gradient boosting, where it uses quantile regression to provide optimistic estimates of the performance of an unobserved hyperparameter setting, and combines this with a distance metric between unobserved and observedhyperparameter settings to help regulate exploration.
agtboost: Adaptive and Automatic Gradient Tree Boosting Computations
- Computer ScienceArXiv
- 2020
agtboost is an R package implementing fast gradient tree boosting computations in a manner similar to other established frameworks such as xgboost and LightGBM, but with significant decreases in…
SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training
- Computer ScienceArXiv
- 2021
SAINT consistently improves performance over previous deep learning methods, and it even performs competitively with gradient boosting methods, including XGBoost, CatBoost, and LightGBM, on average over 30 benchmark datasets in regression, binary classification, and multi-class classification tasks.
References
SHOWING 1-10 OF 19 REFERENCES
CatBoost: unbiased boosting with categorical features
- Computer ScienceNeurIPS
- 2018
This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit and provides a detailed analysis of this problem and demonstrates that proposed algorithms solve it effectively, leading to excellent empirical results.
Fighting biases with dynamic boosting
- Computer ScienceArXiv
- 2017
Experimental results demonstrate that the open-source implementation of gradient boosting that incorporates the proposed algorithm produces state-ofthe-art results outperforming popular gradient boosting implementations.
XGBoost: A Scalable Tree Boosting System
- Computer ScienceKDD
- 2016
This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
Winning The Transfer Learning Track of Yahoo!'s Learning To Rank Challenge with YetiRank
- Computer ScienceYahoo! Learning to Rank Challenge
- 2011
A novel pairwise method called YetiRank is introduced that modifies Friedman's gradient boosting method in part of gradient computation for optimization and takes uncertainty in human judgements into account and allowed yetiRank to outperform many state-of-the-art learning to rank methods in offline experiments.
Adapting boosting for information retrieval measures
- Computer ScienceInformation Retrieval
- 2009
This work presents a new ranking algorithm that combines the strengths of two previous methods: boosted tree classification, and LambdaRank, and shows how to find the optimal linear combination for any two rankers, and uses this method to solve the line search problem exactly during boosting.
GPU-acceleration for Large-scale Tree Boosting
- Computer ScienceArXiv
- 2017
A novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs, which is a crucial step in Gradient Boosted Decision Tree (GBDT) and random forests training and can be used as a drop-in replacement for histogram construction in popular tree boosting systems to improve their scalability.
Enhancing LambdaMART Using Oblivious Trees
- Computer ScienceArXiv
- 2016
Experimental results suggest that the performance of the current state-of-the-art learning to rank algorithm LambdaMART can be improved if standard regression trees are replaced by oblivious trees and demonstrate that the use of oblivious trees can improve the performance by more than $2.2\%$.
Greedy function approximation: A gradient boosting machine.
- Computer Science
- 2001
A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.