On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers

@article{BehrouziFar2018OnTE,
  title={On the Effect of Task-to-Worker Assignment in Distributed Computing Systems with Stragglers},
  author={Amir Behrouzi-Far and E. Soljanin},
  journal={2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)},
  year={2018},
  pages={560-566}
}
  • Amir Behrouzi-Far, E. Soljanin
  • Published 2018
  • Computer Science
  • 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
  • We study the expected completion time of some recently proposed algorithms for distributed computing which redundantly assign computing tasks to multiple machines in order to tolerate a certain number of machine failures. We analytically show that not only the amount of redundancy but also the task-to-machine assignments affect the latency in a distributed system. We study systems with a fixed number of computing tasks that are split in possibly overlapping batches, and independent… CONTINUE READING

    Figures and Topics from this paper.

    Data Replication for Reducing Computing Time in Distributed Systems with Stragglers
    2
    Efficient Replication for Straggler Mitigation in Distributed Computing
    Scheduling in the Presence of Data Intensive Compute Jobs
    1
    Heterogeneous Computation across Heterogeneous Workers
    3
    Heterogeneous Coded Computation across Heterogeneous Workers
    4
    Redundancy Scheduling in Systems with Bi-Modal Job Service Time Distributions
    3
    Straggler-aware Distributed Learning: Communication Computation Latency Trade-off
    4
    Coded Distributed Computing with Partial Recovery

    References

    Publications referenced by this paper.
    SHOWING 1-10 OF 24 REFERENCES
    Using Straggler Replication to Reduce Latency in Large-scale Parallel Computing
    80
    Smart Redundancy for Distributed Computation
    48
    Straggler Mitigation by Delayed Relaunch of Tasks
    29
    Distributed Computing: Principles, Algorithms, and Systems
    303
    Improving Distributed Gradient Descent Using Reed-Solomon Codes
    84
    Near-Optimal Straggler Mitigation for Distributed Gradient Methods
    49
    The Hadoop Distributed File System
    4226
    A new parallel matrix multiplication algorithm on distributed-memory concurrent computers
    52