• Publications
  • Influence
Hive - A Warehousing Solution Over a Map-Reduce Framework
TLDR
We present Hive, an open-source data warehousing solution built on top of Hadoop. Expand
  • 1,700
  • 214
  • PDF
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
TLDR
We propose a simple algorithm called delay scheduling: when the job that should be scheduled next according to fairness cannot launch a local task, it waits a small amount of time, letting other jobs launch tasks instead. Expand
  • 1,433
  • 135
  • PDF
Hive - a petabyte scale data warehouse using Hadoop
TLDR
The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Expand
  • 926
  • 123
  • PDF
Data warehousing and analytics infrastructure at facebook
TLDR
Scalable analysis on large data sets has been core to the functions of a number of teams at Facebook - both engineering and non-engineering. Expand
  • 408
  • 26
  • PDF
Job Scheduling for Multi-User MapReduce Clusters
TLDR
We found that traditional scheduling algorithms can perform very poorly in MapReduce, degrading throughput and response time by factors of 2-10, due to two aspects of the setting: data locality (the need to run computations near the data) and the dependence between map and reduce tasks. Expand
  • 384
  • 21
  • PDF
Fault-Tolerant Rate-Monotonic Scheduling
TLDR
In this paper, we present a recovery scheme which can be used to tolerate faults during the execution of preemptive real-time tasks. Expand
  • 93
  • 5
Apache hadoop goes realtime at Facebook
TLDR
This paper describes the reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort and discusses the application's requirements for consistency, availability, partition tolerance, data model and scalability. Expand
  • 115
  • 3
  • PDF
Schedulability Tests for Fixed Priority Scheduling
Tasks can be classi ed as either periodic which execute every some time units or vice versa as aperiodic. Sporadic tasks are a special case of aperiodic tasks which are guaranteed to have a minimumExpand
  • 2