Measuring the Performance of Data Placement Structures for MapReduce-based Data Warehousing Systems

  title={Measuring the Performance of Data Placement Structures for MapReduce-based Data Warehousing Systems},
  author={S. Kami Makki and Mohammad Rakibul Hasan},
  journal={International journal of new computer architectures and their applications},
  • S. MakkiM. R. Hasan
  • Published 2018
  • Computer Science
  • International journal of new computer architectures and their applications
The exponential growth of data requires systems that are able to provide a scalable and fault-tolerant infrastructure for storage and processing of vast amount of data efficiently. Hive is a MapReduce-based data warehouse for data aggregation and query analysis. This data warehousing system can arrange millions of rows of data into tables, and its data placement structures play a significant role for increasing the performance of this data warehouse. Hive also provides SQL-like language called… 

Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in



RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems

This paper presents a big data placement structure called RCFile (Record Columnar File) and its implementation in the Hadoop system and shows the effectiveness of RCFile in satisfying the four requirements.

Major technical advancements in apache hive

A community-based effort on technical advancements in Hive provides significant improvements on storage efficiency and query execution performance and shows how academic research lays a foundation for Hive to improve its daily operations.

Understanding Insights into the Basic Structure and Essential Issues of Table Placement Methods in Clusters

A benchmarking tool is designed and implemented to provide insights into how variations of each factor affect the I/O performance of reading data of a table stored by a table placement method and suggested actions to optimize table reading performance are given.

Hadoop: The definitive guide (Vol

  • 2015

Hive - a petabyte scale data warehouse using Hadoop

Hive is presented, an open-source data warehousing solution built on top of Hadoop that supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs that are executed using Hadoops.

Hadoop : The definitive guide ( Vol . 54 )

  • 2015

Scaling the Facebook data warehouse to 300 PB

  • 2014

The Data Explosion in 2014 Minute by Minute -Infographic