• Publications
  • Influence
Apache Hadoop YARN: yet another resource negotiator
TLDR
We present the next generation of Hadoop compute platform known as YARN, which departs from its familiar, monolithic architecture. Expand
  • 1,555
  • 257
  • PDF
Reining in the Outliers in Map-Reduce Clusters using Mantri
TLDR
We present Mantri, a system that monitors tasks and culls outliers using real-time progress reports, detects and acts on outliers early in their lifetime. Expand
  • 723
  • 76
  • PDF
Sharing the Data Center Network
TLDR
We present Seawall, a network bandwidth allocation scheme that divides network capacity based on an administrator-specified policy. Expand
  • 371
  • 30
  • PDF
Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications
TLDR
The broad success of Hadoop has led to a fast-evolving and diverse ecosystem of application engines that are building upon the YARN resource management layer. Expand
  • 161
  • 25
  • PDF
Precambrian basin-margin fan deposits: Mesoproterozoic Bagalkot Group, India
Abstract Three-dimensional facies variability in coarse clastic sedimentary rocks (breccia, conglomerates and coarse-grained sandstones) at the base of the Ramdurg Formation suggests terrestrialExpand
  • 35
SHC: Distributed Query Processing for Non-Relational Data Store
TLDR
We introduce a simple data model to process non-relational data for relational operations, and SHC (Apache Spark - Apache HBase Connector), an implementation of this model in the cluster computing framework, Spark. Expand
  • 1
  • PDF
SAC: A System for Big Data Lineage Tracking
TLDR
In the era of big data, a data processing flow contains various types of tasks. Expand
  • 1