Top-k Queries Over Uncertain Scores

  title={Top-k Queries Over Uncertain Scores},
  author={Qing Hong Liu and D. Basu and Talel Abdessalem and St{\'e}phane Bressan},
  booktitle={OTM Conferences},
Modern recommendation systems leverage some forms of collaborative user or crowd sourced collection of information. For instance, services like TripAdvisor, Airbnb and HungyGoWhere rely on user-generated content to describe and classify hotels, vacation rentals and restaurants. By nature of such independent collection of information, the multiplicity, diversity and varying quality of the information collected result in uncertainty. Objects, such as the services offered by hotels, vacation… 
1 Citations
How to Find the Best Rated Items on a Likert Scale and How Many Ratings Are Enough
An algorithm, Pundit, is devised that computes the n-k best-rated items, i.e., the n items that are most likely to be the top-k in a ranking constructed from these ratings.


Top-k queries on uncertain data: on score distribution and typical answers
The need to present the score distribution of top-k vectors to allow the user to choose between results along this score-probability dimensions is demonstrated and a number of typical vectors that effectively sample this distribution are proposed.
Supporting ranking queries on uncertain and incomplete data
A new probabilistic model, based on partial orders, is presented to encapsulate the space of possible rankings originating from score uncertainty to solve the problem of rank aggregation in partial orders under two widely adopted distance metrics.
Top-k Query Processing in Uncertain Databases
A framework that encapsulates a state space model and efficient query processing techniques to tackle the challenges of uncertain data settings is constructed and it is proved that the techniques are optimal in terms of the number of accessed tuples and materialized search states.
Uncertainty in Crowd Data Sourcing Under Structural Constraints
Applications extracting data from crowdsourcing platforms must deal with the uncertainty of crowd answers in two different ways: first, by deriving estimates of the correct value from the answers;
Using the crowd for top-k and group-by queries
The problem of evaluating top-k and group-by queries using the crowd to answer either type or value questions is studied, and efficient algorithms that are guaranteed to achieve good results with high probability are given.
On Pruning for Top-K Ranking in Uncertain Databases
This paper shows a mathematical manipulation of possible worlds which reveals key insights in the part of computation that may be pruned and how to achieve it in a systematic fashion, which leads to concrete pruning methods for a wide range of ranking functions.
Semantics of Ranking Queries for Probabilistic Data and Expected Ranks
This work is able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query, and provides efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty.
Ranking queries on uncertain data: a probabilistic threshold approach
An efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation based algorithm are presented for answering probabilistic threshold top-k queries on uncertain data, which computes uncertain records taking a probability of at least p to be in the top- k list.
Efficient Top-k Query Evaluation on Probabilistic Data
This paper describes a novel approach, which computes and ranks efficiently the top-k answers to a SQL query on a probabilistic database, which is to run in parallel several Monte-Carlo simulations, one for each candidate answer, and approximate each probability only to the extent needed to compute correctly the top -k answers.
Semantics of Ranking Queries for Probabilistic Data
This work proposes an intuitive new ranking definition based on the observation that the ranks of a tuple across all possible worlds represent a well-founded rank distribution, and is able to prove that the expected rank, median rank, and quantile rank satisfy all these properties for a ranking query.