Milind Bhandarkar

Learn More
Hadoop Distributed File System (HDFS) presents unique challenges to the existing energy-conservation techniques and makes it hard to scale-down servers. We propose an energy-conserving, hybrid, logical multi-zoned variant of HDFS for managing data-processing intensive, commodity Hadoop cluster. Green HDFS's data-classification driven data placement allows(More)
HAWQ, developed at Pivotal, is a massively parallel processing SQL engine sitting on top of HDFS. As a hybrid of MPP database and Hadoop, it inherits the merits from both parties. It adopts a layered architecture and relies on the distributed file system for data replication and fault tolerance. In addition, it is standard SQL compliant, and unlike other(More)
— Typical Hadoop setups employ Direct Attached Storage (DAS) with compute nodes and uniform replication of data to sustain high I/O throughput and fault tolerance. However, not all data is accessed at the same time or rate. Thus, if a large replication factor is used to support higher throughput for popular data, it wastes storage by unnecessarily(More)
From it's beginnings as a framework for building web crawlers for small-scale search engines to being one of the most promising technologies for building datacenter-scale distributed computing and storage platforms, Apache Hadoop has come far in the last seven years. In this talk I will reminisce about the early days of Hadoop, and will give an overview of(More)
  • 1