Decentralized Task-Aware Scheduling for Data Center Networks

@inproceedings{Dogar2013DecentralizedTS,
  title={Decentralized Task-Aware Scheduling for Data Center Networks},
  author={Fahad R. Dogar and T. Karagiannis and Hitesh Ballani and A. Rowstron},
  year={2013}
}
Many data center applications perform rich and complex tasks (e.g., executing a search query or generating a user’s news-feed). From a network perspective, these tasks typically comprise multiple flows, which traverse different parts of the network at potentially different times. Most network resource allocation schemes, however, treat all these flows in isolation – rather than as part of a task – and therefore only optimize flow-level metrics. In this paper, we show that task-aware network… Expand
Joint Scheduling of Tasks and Network Flows in Big Data Clusters
TLDR
A software-defined network (SDN)-based online scheduling framework which selects the task placement based on the available bandwidth on the SDN switches and at meanwhile optimally allocates the bandwidth to each data flow and can take full use of the network bandwidth. Expand
Network Scheduling and Compute Resource Aware Task Placement in Datacenters
TLDR
NEAT+ is a task scheduling framework that leverages information from the underlying network scheduler and available compute resources to make task placement decisions and leverages the predicted task completion times to minimize the average completion time of active tasks. Expand
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
TLDR
Corral is a scheduling framework that uses characteristics of future workloads to determine an offline schedule which jointly places data and compute to achieve better data locality, and isolates jobs both spatially (by scheduling them in different parts of the cluster) and temporally, improving their performance. Expand
Towards shorter task completion time in datacenter networks
TLDR
This paper carefully examines the possibility to consider both task and flow level metrics together and presents the design of TAFA (Task-Aware and Flow-Aaware) in data center networks, showing that TAFA can obtain a near-optimal performance and reduce over 35% task completion time for the existing data center systems. Expand
Task-Aware TCP in Data Center Networks
TLDR
This work reveals that the relinquishing bandwidth of leading flows to the stalled ones effectively reduces the task completion time and presents the design and implementation of a general supporting scheme that shares the flow-tardiness information through a receiver-driven coordination. Expand
Rapier: Integrating routing and scheduling for coflow-aware data center networks
TLDR
This work presents Rapier, a coflow-aware network optimization framework that seamlessly integrates routing and scheduling for better application performance, and demonstrates that Rapier significantly reduces the average coflow completion time. Expand
OPTAS: Decentralized flow monitoring and scheduling for tiny tasks
TLDR
OPTAS is a lightweight, commodity-switch-compatible scheduling solution that efficiently monitors and schedules flows for tiny tasks with low overhead, and transfers tiny tasks in a FIFO manner by adjusting two attributes, namely, the window size and round trip time, of TCP. Expand
Minimizing average coflow completion time with decentralized scheduling
TLDR
D-CAS is proposed, a preemptive, decentralized, coflow-aware scheduling system that pursues coflows' remaining-time-first (MRTF) principle by leveraging a simple negotiation mechanism between each coflow's data senders and receivers and achieves a performance close to Varys and outperforms Baraat. Expand
Providing In-network Support to Coflow Scheduling
TLDR
It is shown that dynamically changing flow priorities at the end-host, without considering in-flight packets, can cause high degrees of packet reordering, thus imposing pressure on the congestion control and potentially harming network performance in the presence of switches with shallow buffers. Expand
Adia: Achieving High Link Utilization with Coflow-Aware Scheduling in Data Center Networks
TLDR
Adia is designed, a hierarchical scheduling framework to conduct both inter- and intra- link scheduling that leverages priority-based scheduling while guarantees work-conserving and starvation-free bandwidth allocation at the same time and proves Adia's algorithm is two-approximate in terms of link utilization. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 39 REFERENCES
Better never than late: meeting deadlines in datacenter networks
The soft real-time nature of large scale web applications in today's datacenters, combined with their distributed workflow, leads to deadlines being associated with the datacenter applicationExpand
Speeding up distributed request-response workflows
TLDR
Kwiken, a framework that takes an end-to-end view of latency improvements and costs, decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Expand
Hedera: Dynamic Flow Scheduling for Data Center Networks
TLDR
Hedera is presented, a scalable, dynamic flow scheduling system that adaptively schedules a multi-stage switching fabric to efficiently utilize aggregate network resources and delivers bisection bandwidth that is 96% of optimal and up to 113% better than static load-balancing methods. Expand
Finishing flows quickly with preemptive scheduling
TLDR
It is demonstrated that PDQ significantly outperforms TCP, RCP and D3 in data center environments, and is stable, resilient to packet loss, and preserves nearly all its performance gains even given inaccurate flow information. Expand
Symbiotic routing in future data centers
TLDR
This paper designs an extended routing service allowing easy implementation of application-specific routing protocols on CamCube, and demonstrates the benefits and network-level impact of running multiple routing protocols. Expand
PACMan: Coordinated Memory Caching for Parallel Jobs
TLDR
PACMan, a caching service that coordinates access to the distributed caches that reduces average completion time of jobs, and improves efficiency of the cluster by 47% and 54%, respectively, on production workloads from Facebook and Microsoft Bing. Expand
A scalable, commodity data center network architecture
TLDR
This paper shows how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements and argues that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today's higher-end solutions. Expand
Sharing the Data Center Network
TLDR
This work presents Seawall, a network bandwidth allocation scheme that divides network capacity based on an administrator-specified policy that adds little overhead and achieves strong performance isolation. Expand
Managing data transfers in computer clusters with orchestra
Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, accountingExpand
Sparrow: Scalable Scheduling for Sub-Second Parallel Jobs
TLDR
It is demonstrated that a decentralized, randomized sampling approach provides nearoptimal performance while avoiding the throughput and availability limitations of a centralized design and within 14% of an ideal scheduler. Expand
...
1
2
3
4
...