QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees

@article{Lucchese2015QuickScorerAF,
  title={QuickScorer: A Fast Algorithm to Rank Documents with Additive Ensembles of Regression Trees},
  author={Claudio Lucchese and Franco Maria Nardini and Salvatore Orlando and R. Perego and Nicola Tonellotto and Rossano Venturini},
  journal={Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2015}
}
  • C. Lucchese, F. M. Nardini, Rossano Venturini
  • Published 9 August 2015
  • Computer Science
  • Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
Learning-to-Rank models based on additive ensembles of regression trees have proven to be very effective for ranking query results returned by Web search engines, a scenario where quality and efficiency requirements are very demanding. Unfortunately, the computational cost of these ranking models is high. Thus, several works already proposed solutions aiming at improving the efficiency of the scoring process by dealing with features and peculiarities of modern CPUs and memory hierarchies. In… 

Figures and Tables from this paper

Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees
TLDR
QuickScorer is presented, a new algorithm that adopts a novel cache-efficient representation of a given tree ensemble, performs an interleaved traversal by means of fast bitwise operations, and supports ensembles of oblivious trees.
QuickScorer: Efficient Traversal of Large Ensembles of Decision Trees
TLDR
QuickScorer is presented, a novel algorithm for the traversal of huge decision trees ensembles that, thanks to a cache- and CPU-aware design, provides a speedup over best competitors.
RapidScorer: Fast Tree Ensemble Evaluation by Maximizing Compactness in Data Level Parallelization
TLDR
RapidScorer is presented, a novel framework for speeding up the scoring process of industry-scale tree ensemble models, without hurting the quality of scoring results, by introducing a modified run length encoding called epitome to the bitvector representation of the tree nodes.
GPU-based Parallelization of QuickScorer to Speed-up Document Ranking with Tree Ensembles
TLDR
GPUSCORER is a GPU-based parallelization of the state-of-the-art algorithm QUICKSCORER to score documents with tree ensembles that takes advantage of the huge computational power of GPUs to perform tree ensemble traversal by evaluating multiple documents simultaneously.
Speeding-up Document Scoring with Tree Ensembles using CPU SIMD Extensions
TLDR
V-QuickScorer (vQS), an algorithm which exploits SIMD vector extensions on modern CPUs to perform the traversal of the ensamble in parallel by evaluating multiple documents simultaneously is proposed, showing that vQS outperforms competitors with speed-ups up to a factor of 2.4x.
X-CLE a VER: Learning Ranking Ensembles by Growing and Pruning Trees
TLDR
This article proposes X-CLEaVER, an iterative meta-algorithm able to build more efficient and effective ranking ensembles and analyzes several pruning strategies, showing that interleaving pruning and re-weighting phases during learning is more effective than applying a single post-learning optimization step.
Multicore/Manycore Parallel Traversal of Large Forests of Regression Trees
TLDR
It is shown that QUICKSCORER, which transforms the traversal of thousands of decision trees in a linear access to array data structures, can be parallelized very effectively, by achieving very interesting speedups.
Learning Early Exit Strategies for Additive Ranking Ensembles
TLDR
This work proposes LEAR, a novel - learned - technique aimed to reduce the average number of trees traversed by documents to accumulate the scores, thus reducing the overall query response time and provides a comprehensive experimental evaluation on two public datasets.
Exploiting CPU SIMD Extensions to Speed-up Document Scoring with Tree Ensembles
TLDR
This paper proposes V-QuickScorer (vQS), which exploits SIMD extensions to vectorize the document scoring, i.e., to perform the ensemble traversal by evaluating multiple documents simultaneously.
Query-level Early Exit for Additive Learning-to-Rank Ensembles
TLDR
The main finding is that queries exhibit different behaviors as scores are accumulated during the traversal of the ensemble and that query-level early stopping can remarkably improve ranking quality.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 23 REFERENCES
Early exit optimizations for additive machine learned ranking systems
TLDR
By proposing optimization strategies that allow short-circuiting score computations in additive learning systems, this paper is able to speedup the score computation process by more than four times with almost no loss in result quality.
Learning to efficiently rank
TLDR
This work presents a unified framework for jointly optimizing effectiveness and efficiency and proposes new metrics that capture the tradeoff between these two competing forces and devise a strategy for automatically learning models that directly optimize the tradeoffs.
A cascade ranking model for efficient ranked retrieval
TLDR
A novel cascade ranking model is formulated and developed, which unlike previous approaches, can simultaneously improve both top k ranked effectiveness and retrieval efficiency and a novel boosting algorithm is presented for learning such cascades to directly optimize the tradeoff between effectiveness and efficiency.
Adapting boosting for information retrieval measures
TLDR
This work presents a new ranking algorithm that combines the strengths of two previous methods: boosted tree classification, and LambdaRank, and shows how to find the optimal linear combination for any two rankers, and uses this method to solve the line search problem exactly during boosting.
Runtime Optimizations for Tree-Based Machine Learning Models
TLDR
This paper focuses on optimizing the runtime performance of applying tree-based models to make predictions, specifically using gradient-boosted regression trees for learning to rank, and shows that this approach is significantly faster than standard implementations.
Cache-conscious runtime optimization for ranking ensembles
TLDR
This paper investigates data traversal methods for fast score calculation with a large ensemble and proposes a 2D blocking scheme for better cache utilization with simpler code structure compared to previous work.
Learning to rank for information retrieval
TLDR
Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.
Ranking under temporal constraints
TLDR
This paper proposes two temporally constrained ranking algorithms based on a class of probabilistic prediction models that can naturally incorporate efficiency constraints: one that makes independent feature selection decisions, and the other that makes joint features selection decisions.
Bagging gradient-boosted trees for high precision, low variance ranking models
TLDR
It is shown how the combination of bagging as a variance reduction technique and boosting as a bias reduction technique can result in very high precision and low variance ranking models.
Cumulated gain-based evaluation of IR techniques
TLDR
This article proposes several novel measures that compute the cumulative gain the user obtains by examining the retrieval result up to a given ranked position, and test results indicate that the proposed measures credit IR methods for their ability to retrieve highly relevant documents and allow testing of statistical significance of effectiveness differences.
...
1
2
3
...