ApproxRank: Estimating Rank for a Subgraph

Abstract

Customized semantic query answering, personalized search, focused crawlers and localized search engines frequently focus on ranking the pages contained within a subgraph of the global Web graph. The challenge for these applications is to compute PageRank-style scores efficiently on the subgraph, i.e., the ranking must reflect the global link structure of the Web graph but it must do so without paying the high overhead associated with a global computation. We propose a framework of an exact solution and an approximate solution for computing ranking on a subgraph.The IdealRank algorithm is an exact solution with the assumption that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. Both IdealRank and ApproxRank represent the set of external pages with an external node $\Lambda$ and extend the subgraph with links to $\Lambda$. They also modify the PageRank-style transition matrix with respect to $\Lambda$.We analyze the $L_1$ distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a constant factor of the $L_1$ distance of the external pages (e.g., the true PageRank scores and uniform scores assumed by ApproxRank). We compare ApproxRank and a stochastic complementation approach (SC), a current best solution for this problem, on different types of subgraphs. ApproxRank has similar or superior performance to SC and typically improves on the runtime performance of SC by an order of magnitude or better. We demonstrate that ApproxRank provides a good approximation to PageRank for a variety of subgraphs.

DOI: 10.1109/ICDE.2009.108

Extracted Key Phrases

12 Figures and Tables

Cite this paper

@article{Wu2009ApproxRankER, title={ApproxRank: Estimating Rank for a Subgraph}, author={Yao Wu and Louiqa Raschid}, journal={2009 IEEE 25th International Conference on Data Engineering}, year={2009}, pages={54-65} }