Statistical ranking and combinatorial Hodge theory

  title={Statistical ranking and combinatorial Hodge theory},
  author={Xiaoye Jiang and Lek-Heng Lim and Y. Yao and Yinyu Ye},
  journal={Mathematical Programming},
We propose a technique that we call HodgeRank for ranking data that may be incomplete and imbalanced, characteristics common in modern datasets coming from e-commerce and internet applications. We are primarily interested in cardinal data based on scores or ratings though our methods also give specific insights on ordinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method exploits the graph Helmholtzian… 
Enhanced statistical rankings via targeted data collection
A framework to identify data which, when augmented with the current dataset, maximally increases the Fisher information of the ranking is proposed and the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of theranking.
Hodge Decomposition of Paired Comparison Flows in Click-through Data
The studies show that it is not a proper way to directly apply Joachim’s models on typical industrial data collected from a commercial search engine, which tends to revert the order of search engines and thus leaves a larger deviation from Human Rating Scores even than random ranking.
Learning from ranking data : theory and methods
A method to upper bound the Kendall’s tau distance between any consensus candidate and a Kemeny consensus, on any dataset, and an approach to approximate any distribution on rankings by a distribution exhibiting a specific type of sparsity is proposed.
Optimal data collection for informative rankings expose well-connected graphs
This paper studies the Yahoo! Movie user rating data set and demonstrates that the addition of a small number of well-chosen pairwise comparisons can significantly increase the Fisher informativeness of the ranking.
Least Squares Ranking on Graphs, Hodge Laplacians, Time Optimality, and Iterative Methods
The least squares problems of ranking are a 2-norm version of the optimal homologous chain problem of computational topology, and it is shown that if a graph is the 1-skeleton of a cell complex on a compact surface the second least squares system matrix is also SDD, which implies optimality via KMP.
A Statistical Convergence Perspective of Algorithms for Rank Aggregation from Pairwise Data
This paper shows that, under a 'time-reversibility' or Bradley-Terry-Luce (BTL) condition on the distribution, the rank centrality (PageRank) and least squares (HodgeRank) algorithms both converge to an optimal ranking.
Statistical ranking using the $l^{1}$-norm on graphs
A fast graph-cut approach for finding $\epsilon$-optimal solutions, which has been used successfully in image processing and computer vision problems, is described and its efficacy at finding solutions with sparse residual is demonstrated.
Optimal Data Collection for Improved Rankings Expose Well-Connected Graphs
This paper argues that the NCAA could improve its notoriously poor rankings by simply scheduling more out-of-conference games by using spectral clustering methods to identify highly-connected communities within the division.
The rankability of weighted data from pairwise comparisons
This paper extends rankability methods to weighted data for which an item may dominate another by any finite amount and presents combinatorial approaches to a weighted rankability measure and applies the new measure to several weighted datasets.
Multiresolution analysis of ranking data
A new representation for the data is introduced, which by construction overcomes the two aforementioned challenges of statistical and computational challenge, offering a natural and efficient framework for the analysis of incomplete rankings.


Magnitude-preserving ranking algorithms
This paper describes and analyzes several algorithms for ranking when one wishes not just to accurately predict pairwise ordering but also preserve the magnitude of the preferences or the difference between ratings, extending previously known stability results to non-bipartite ranking and magnitude of preference-preserving algorithms.
An Efficient Boosting Algorithm for Combining Preferences
This work describes and analyze an efficient algorithm called RankBoost for combining preferences based on the boosting approach to machine learning, and gives theoretical results describing the algorithm's behavior both on the training data, and on new test data not seen during training.
Convex Rank Tests and Semigraphoids
The methods refine existing rank tests of nonparametric statistics, such as the sign test and the runs test, and are useful for exploratory analysis of ordinal data and of particular interest are graphical tests, which correspond to both graphical models and to graph associahedra.
Methodologies and Algorithms for Group-Rankings Decision
A new paradigm using an optimization framework that addresses major shortcomings that exist in current models of group ranking is presented, and is solvable in polynomial time.
Aggregating inconsistent information: Ranking and clustering
This work almost settles a long-standing conjecture of Bang-Jensen and Thomassen and shows that unless NP⊆BPP, there is no polynomial time algorithm for the problem of minimum feedback arc set in tournaments.
Topology of random clique complexes
Empirical stationary correlations for semi-supervised learning on graphs
  • Ya Xu
  • Computer Science
  • 2009
This paper proves that many semi-supervised learning proposals are equivalent to kriging predictors based on a fixed covariance matrix driven by the link structure of the graph and proposes a data-driven estimator of the correlation structure that exploits patterns among the observed response values.
Computing Betti Numbers via Combinatorial Laplacians
The Laplacian and power method is used to compute Betti numbers of simplicial complexes, which has a number of advantages over other methods, both in theory and in practice, but its running time depends on a ratio, ν, of eigenvalues which the authors have yet to understand fully.
A Scaling Method for Priorities in Hierarchical Structures