Adapting boosting for information retrieval measures

@article{Wu2010AdaptingBF,
  title={Adapting boosting for information retrieval measures},
  author={Qiang Wu and Christopher J. C. Burges and Krysta Marie Svore and Jianfeng Gao},
  journal={Information Retrieval},
  year={2010},
  volume={13},
  pages={254-270}
}
We present a new ranking algorithm that combines the strengths of two previous methods: boosted tree classification, and LambdaRank, which has been shown to be empirically optimal for a widely used information retrieval measure. [] Key Method We also show how to find the optimal linear combination for any two rankers, and we use this method to solve the line search problem exactly during boosting. In addition, we show that starting with a previously trained model, and boosting using its residuals, furnishes…

Scalability and Performance of Random Forest based Learning-to-Rank for Information Retrieval

This research investigates the random forest based LtR algorithms and develops methods for estimating the bias and variance of rank-learning algorithms, and examines their empirical behavior against parameters of the learning algorithm.

Learning to Rank on a Cluster using Boosted Decision Trees

This work investigates the problem of learning to rank on a cluster of Web search data composed of 140,000 queries and approximately fourteen mil lion URLs, and a boosted tree ranking algorithm called LambdaMART, and implements a method for improving the speed of training when the training data fits in main memory on a single machine.

BoostingTree: parallel selection of weak learners in boosting, with application to ranking

A strategy that builds several sequences of weak hypotheses in parallel in parallel, and extends the ones that are likely to yield a good model and converges to similar performance as the original boosting algorithms otherwise is proposed.

A Robust Ranking Methodology Based on Diverse Calibration of AdaBoost

This paper introduces a learning to rank approach to subset ranking based on multi-class classification that outperformed many standard ranking algorithms on the LETOR benchmark datasets and is therefore less prone to overfitting.

Active Learning to Rank Method for Documents Retrieval

A new active learning to rank algorithm based on boosting for active ranking functions to introduce unlabeled data in the learning process and to evaluate the performance of pairwise and listwise approaches.

Distributed Machine Learning

This work investigates the problem of learning to rank on a cluster using Web search data composed of 140,000 queries and approximately fourteen million URLs, and implements a method for improving the speed of training when the training data fits in main memory on a single machine by distributing the vertex split computations of the decision trees.

Ranking function adaptation with boosting trees

A new approach called tree-based ranking function adaptation (Trada) is proposed to effectively utilize data sources for training cross-domain ranking functions and is extended to utilize the pairwise preference data from the target domain to further improve the effectiveness of adaptation.

Two-Stage Learning to Rank for Information Retrieval

Empirical evaluation using two web collections unequivocally demonstrates that the proposed two-stage framework, being able to learn its model from more relevant documents, outperforms current learning to rank approaches.

Learning to rank, a supervised approach for ranking of documents

This thesis investigates state-of-the-art machine learning methods for ranking known as learning to rank to explore if it can be used in enterprise search, which means less data and less document features than web based search.

Direct optimization of ranking measures for learning to rank models

A novel learning algorithm is presented, DirectRank, which directly and exactly optimizes ranking measures without resorting to any upper bounds or approximations, and a probabilistic framework for document-query pairs is constructed to maximize the likelihood of the objective permutation of top-$\tau$ documents.
...

References

SHOWING 1-10 OF 39 REFERENCES

A General Boosting Method and its Application to Learning Ranking Functions for Web Search

We present a general boosting method extending functional gradient boosting to optimize complex loss functions that are encountered in many machine learning problems. Our approach is based on

On the local optimality of LambdaRank

It is shown that LambdaRank, which smoothly approximates the gradient of the target measure, can be adapted to work with four popular IR target evaluation measures using the same underlying gradient construction.

Model Adaptation via Model Interpolation and Boosting for Web Search Ranking

This paper explores two classes of model adaptation methods for Web search ranking: Model Interpolation and error-driven learning approaches based on a boosting algorithm. The results show that model

Learning to Rank Using Classification and Gradient Boosting

This work considers the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval, and proposes using the Expected Relevance to convert the class probabilities into ranking scores.

McRank: Learning to Rank Using Multiple Classification and Gradient Boosting

This work considers the DCG criterion (discounted cumulative gain), a standard quality measure in information retrieval, and proposes using the Expected Relevance to convert class probabilities into ranking scores.

Linear discriminant model for information retrieval

Results show that in most test sets, LDM significantly outperforms the state-of-the-art language modeling approaches and the classical probabilistic retrieval model and it is more appropriate to train LDM using a measure of AP rather than likelihood if the IR system is graded on AP.

A support vector method for optimizing average precision

This work presents a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP, and shows its method to produce statistically significant improvements in MAP scores.

Trada: tree based ranking function adaptation

Tree adaptation assumes that ranking functions are trained with regression-tree based modeling methods, such as Gradient Boosting Trees, and takes such a ranking function from one domain and tunes its tree-based structure with a small amount of training data from the target domain.

Learning to rank: from pairwise approach to listwise approach

It is proposed that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning, and introduces two probability models, respectively referred to as permutation probability and top k probability, to define a listwise loss function for learning.

A general language model for information retrieval

A new language model for information retrieval is presented, which is based on a range of data smoothing techniques, including the Good-Turning estimate, curve-fitting functions, and model combinations, and can be easily extended to incorporate probabilities of phrases such as word pairs and word triples.