• Corpus ID: 86619595

The Hive Project

  title={The Hive Project},
  author={Cameron Rose},
Gerenuk: thin computation over big native data using speculative program transformation
Gerenuk is developed, a compiler and runtime that aims to enable a JVM-based data-parallel system to achieve near-native efficiency by transforming a set of statements in the system for direct execution over inlined native bytes.
A Generic and Scalable Pipeline for Large-Scale Analytics of Continuous Aircraft Engine Data
A generic and scalable pipeline for large-scale analytics of operational data from a recent type of aircraft engine, oriented towards health monitoring applications, based on Hadoop and Spark, that enables domain experts to scale their algorithms and extract features from tens of thousands of flights stored on a cluster.
Compiler and Runtime Supports for High-Performance , Scalable Big Data Systems Research Statement
Throughout my Ph.D., this work has designed and developed a series of system optimizations to enable scalable Big Data processing including a new programming model, and several novel compiler and runtime supports.
Basic algorithms for bee hive monitoring and laser-based mite control
The objective of this work is to implement a beehive monitoring system to monitor essential parameters of a bee hive and additionally including an image recognition algorithm to observe the degree of infestation with Varroa mites.
Optimization of Multiple Queries for Big Data with Apache Hadoop/Hive
  • Varun Garg
  • Computer Science
    2015 International Conference on Computational Intelligence and Communication Networks (CICN)
  • 2015
This manuscript proposes the use of Multi Query Optimization (MQO) technique to enhance the overall performance of Hadoop/Hive and transforms a set of interrelated HiveQL queries into new global queries that can produce the same results in remarkably smaller total execution times.
BigOP: Generating Comprehensive Big Data Workloads as a Benchmarking Framework
BigOP, an end-to-end system benchmarking framework, featuring the abstraction of representative Operation sets, workload Patterns, and prescribed tests, is presented, part of an open-source big data benchmarking project, BigDataBench.
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
This year's SOSP program includes 30 papers, and touches on a wide range of computer systems topics, from kernels to big data, from responsiveness to correctness, and from devices to data centers.
Improving Scheduling in Heterogeneous Grid and Hadoop Systems
This thesis introduces a new Data Grid scheduling algorithm, which dynamically makes replication and scheduling decisions, and introduces a Hadoop scheduling system, which uses system information such as estimated job arrival rates and mean job execution times to make scheduling decisions.
H-DB: Yet Another Big Data Hybrid System of Hadoop and DBMS
A novel prototype H-DB is proposed which takes DBMSs as the underlying storage and execution units, and Hadoop as an index layer and a cache and meets the demand, outperforms original system and would be appropriate for analogous big data applications.
Dandelion: a compiler and runtime for heterogeneous systems
Dandelion automatically and transparently distributes data-parallel portions of a program to available computing resources, including compute clusters for distributed execution and CPU and GPU cores of individual nodes for parallel execution.