TopCrowd - Efficient Crowd-enabled Top-k Retrieval on Incomplete Data

  title={TopCrowd - Efficient Crowd-enabled Top-k Retrieval on Incomplete Data},
  author={Christian Nieke and Ulrich G{\"u}ntzer and Wolf-Tilo Balke},
Building databases and information systems over data extracted from heterogeneous sources like the Web poses a severe challenge: most data is in- complete and thus difficult to process in structured queries. This is especially true for sophisticated query techniques like Top -k querying where rankings are aggregated over several sources. The intelligent combination of efficient data processing algorithms with crowdsourced database operators promises to alle- viate the situation. Yet the… 

A Method of A-BAT Algorithm Based Query Optimization for Crowd Sourcing System

A novel A-BAT algorithm is proposed, which highly improve convergence speed, accuracy and Latency, and this algorithm uses a Random walk phase to mine the information from the Crowd.

Hybrid human-machine information systems: Challenges and opportunities

Quest for the Gold Par: Minimizing the Number of Gold Questions to Distinguish between the Good and the Bad

This paper identifies custom-tailored numbers of gold questions per worker for managing the cost/quality balance by employing probabilistic models, namely Bayesian belief networks and certainty factor models, and proves that the actual number of gold Questions per worker can indeed be assessed.

Fine-Tuning Gold Questions in Crowdsourcing Tasks using Probabilistic and Siamese Neural Network Models

Deep learning techniques are used to identify custom-tailored numbers of gold questions per worker for individually managing the cost/quality balance, and it is proved that the actual number of gold Questions per worker can indeed be assessed.

Query Processing over Incomplete Databases

Abstract Incomplete data is part of life and almost all areas of scientific studies. Users tend to skip certain fields when they fill out online forms; participants choose to ignore sensitive quest...



Pushing the Boundaries of Crowd-enabled Databases with Query-driven Schema Expansion

This paper extends crowd-enabled databases by flexible query-driven schema expansion, allowing the addition of new attributes to the database at query time, and leverages the usergenerated data found in the Social Web to build perceptual spaces.


CrowdDB is a hybrid database system that automatically uses crowdsourcing to integrate human input for processing queries that a normal database system cannot answer.

Query Processing over Incomplete Autonomous Databases

This work introduces a novel query rewriting and optimization framework QPIAD, which involves reformulating the user query based on mined correlations among the database attributes, and develops methods for mining attribute correlations, value distributions, and selectivity estimates.

Evaluating top-k queries over web-accessible databases

This paper studies how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces.

Progressive distributed top-k retrieval in peer-to-peer networks

This paper discusses the benefits of best match/top-k queries in the context of distributed peer-to-peer information infrastructures and shows how to extend the limited query processing in current peer- to-peer networks by allowing the distributed processing of top- k queries, while maintaining a minimum of data traffic.

Using the crowd for top-k and group-by queries

The problem of evaluating top-k and group-by queries using the crowd to answer either type or value questions is studied, and efficient algorithms that are guaranteed to achieve good results with high probability are given.

Human-powered Sorts and Joins

This paper describes how MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task, and proposes a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them.

So who won?: dynamic max discovery with the crowd

It is shown that in a crowdsourcing DB system, the optimal solution to both problems is NP-Hard, and heuristic functions are provided to select the maximum given evidence, and to select additional votes.

Towards efficient multi-feature queries in heterogeneous environments

This work presents a new algorithm, called Stream-Combine, for processing multi-feature queries on heterogeneous data sources that is self-adapting to different data distributions and to the specific kind of the combining function.