Crowdsourcing enables programmers to incorporate "human computation" as a building block in algorithms that cannot be fully automated, such as text analysis and image recognition. Similarly, humans… (More)
Given a large set of data items, we consider the problem of filtering them based on a set of properties that can be verified by humans. This problem is commonplace in crowdsourcing applications, and… (More)
Data analysts often build visualizations as the first step in their analytical workflow. However, when working with high-dimensional datasets, identifying visualizations that show relevant or desired… (More)
Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a)… (More)
We consider a crowdsourcing database system that may cleanse, populate, or filter its data by using human workers. Just like a conventional DB system, such a crowdsourcing DB system requires data… (More)
Deco is a system that enables declarative crowdsourcing: answering SQL queries posed over data gathered from the crowd as well as existing relational data. Deco implements a novel push-pull hybrid… (More)
2012 IEEE 28th International Conference on Data…
2012
Fuzzy/similarity joins have been widely studied in the research community and extensively used in real-world applications. This paper proposes and evaluates several algorithms for finding all pairs… (More)
In entity matching, a fundamental issue while training a classifier to label pairs of entities as either duplicates or non-duplicates is the one of selecting informative training examples. Although… (More)
We study the problem of making recommendations when the objects to be recommended must also satisfy constraints or requirements. In particular, we focus on course recommendations: the courses taken… (More)
Given a batch of human computation tasks, a commonly ignored aspect is how the price (i.e., the reward paid to human workers) of these tasks must be set or varied in order to meet latency or cost… (More)