Advancements in YARN Resource Manager

@inproceedings{Karanasos2019AdvancementsIY,
  title={Advancements in YARN Resource Manager},
  author={Konstantinos Karanasos and Arun Suresh and Chris Douglas},
  booktitle={Encyclopedia of Big Data Technologies},
  year={2019}
}
Apache Hadoop (2017), one of the most widely adopted implementations of MapReduce (Dean and Ghemawat 2004), revolutionized the way that companies perform analytics over vast amounts of data. It enables parallel data processing over clusters comprised of thousands of machines while alleviating the user from implementing complex communication patterns and fault tolerance mechanisms. With its rise in popularity, came the realization that Hadoop’s resource model for MapReduce, albeit flexible, is… 
Hydra: a federated resource manager for data-center scale analytics
Microsoft’s internal data lake processes exabytes of data over millions of cores daily on behalf of thousands of tenants. Scheduling this workload requires 10x to 100x more decisions per second than
Medea: scheduling of long running applications in shared production clusters
TLDR
Evaluated on a 400-node cluster, the implementation of Medea on Apache Hadoop YARN achieves placement of long-running applications with significant performance and resilience benefits compared to state-of-the-art schedulers.
TonY: An Orchestrator for Distributed Machine Learning Jobs
TLDR
TonY, an open-source orchestrator for distributed ML jobs built at LinkedIn to address the challenges of managing a distributed training job that requires dealing with resource contention, distributed configurations, monitoring, and fault tolerance.
Survey of Methodologies, Approaches, and Challenges in Parallel Programming Using High-Performance Computing Systems
TLDR
This paper provides a review of contemporary methodologies and APIs for parallel programming, with representative technologies selected in terms of target system type, communication patterns, and programming abstraction level to identify trends in high-performance computing and of the challenges to be addressed in the near future.

References

SHOWING 1-8 OF 8 REFERENCES
Apache Hadoop YARN: yet another resource negotiator
TLDR
The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
Morpheus: Towards Automated SLOs for Enterprise Clusters
TLDR
Morpheus is a new system that codifies implicit user expectations as explicit Service Level Objectives (SLOs) inferred from historical data, enforces SLOs using novel scheduling techniques that isolate jobs from sharing-induced performance variability, and mitigates inherent performance variance by means of dynamic reprovisioning of jobs.
MapReduce : Simplified Data Processing on Large Cluster
TLDR
This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
TLDR
Mercury is proposed, a hybrid resource management framework that supports the full spectrum of scheduling, from centralized to distributed, and exposes a programmatic interface that allows applications to trade-off between scheduling overhead and execution guarantees.
Efficient queue management for cluster scheduling
TLDR
This is the first work to provide principled solutions to the above problems by introducing queue management techniques, such as appropriate queue sizing, prioritization of task execution via queue reordering, starvation freedom, and careful placement of tasks to queues.
Reservation-based Scheduling: If You're Late Don't Blame Us!
TLDR
This paper proposes a reservation definition language (RDL) that allows users to declaratively reserve access to cluster resources, and formalizes planning of current and future cluster resources as a Mixed-Integer Linear Programming (MILP) problem, and proposes scalable heuristics.
Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
TLDR
Dominant Resource Fairness (DRF), a generalization of max-min fairness to multiple resource types, is proposed, and it is shown that it leads to better throughput and fairness than the slot-based fair sharing schemes in current cluster schedulers.
Lessons learned from scaling YARN to 40 K machines in a multitenancy environment
  • 2017