• Corpus ID: 250144193

Exploiting Network Loss for Distributed Approximate Computing with NetApprox

@inproceedings{Liu2019ExploitingNL,
  title={Exploiting Network Loss for Distributed Approximate Computing with NetApprox},
  author={Ke Liu and Jinmou Li and Shin-Yeh Tsai and Theophilus A. Benson and Yiying Zhang},
  year={2019}
}
Many data center applications such as machine learning and big data analytics can complete their analysis without processing the complete set of data. While extensive approximate-aware optimizations have been proposed at hardware, programming language, and application levels; however, to date, the approximate computing optimizations have ignored the network layer. We propose NetApprox, which to the best of our knowledge, is the first approximate-aware network layer comprising transport-layer… 

Figures from this paper

References

SHOWING 1-10 OF 58 REFERENCES

Workload analysis of a large-scale key-value store

This paper collects detailed traces from Facebook's Memcached deployment, arguably the world's largest, and analyzes the workloads from multiple angles, including: request composition, size, and rate; cache efficacy; temporal patterns; and application use cases.

Inside the Social Network's (Datacenter) Network

The contrasting locality, stability, and predictability of network traffic in Facebook's datacenters are reported on, and their implications for network architecture, traffic engineering, and switch design are commented on.

StreamApprox: approximate computing for stream analytics

An online stratified reservoir sampling algorithm to produce approximate output with rigorous error bounds is designed and can be applied to two prominent types of stream processing systems: (1) batched stream processing such as Apache Spark Streaming, and (2) pipelined stream processingsuch as Apache Flink.

ApproxHadoop: Bringing Approximations to MapReduce Frameworks

The proposed framework and system can make approximation easily accessible to many application domains using the MapReduce model and can significantly reduce application execution time and/or energy consumption when the user is willing to tolerate small errors.

BlinkDB: queries with bounded errors and bounded response times on very large data

BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.

Gradient Compression Supercharged High-Performance Data Parallel DNN Training

A compression-aware gradient synchronization architecture, CaSync, is proposed, which relies on a flexible composition of basic computing and communication primitives and is general and compatible with any gradient compression algorithms and gradient synchronization strategies, and enables high-performance computation-communication pipelining.

ACC: automatic ECN tuning for high-speed datacenter networks

An automatic run-time optimization scheme, ACC, which leverages the multi-agent reinforcement learning technique to dynamically adjust the marking threshold at each switch, which has been applied in high-speed datacenter networks and significantly simplifies the network operations.

A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters

For representative DNN training jobs with up to 256 GPUs, BytePS outperforms the state-of-the-art open source all-reduce and PS by up to 84% and 245%, respectively.

Swift: Delay is Simple and Effective for Congestion Control in the Datacenter

In large-scale testbed experiments, Swift delivers a tail latency of <50μs for short RPCs, with near-zero packet drops, while sustaining ~100Gbps throughput per server, while providing high throughput for long RPCs.

Aeolus: A Building Block for Proactive Transport in Datacenters

A Aeolus, a solution focusing on "pre-credit" packet transmission as a building block for proactive transports, which contains unconventional design principles such as scheduled-packet-first (SPF) that de-prioritizes the first-RTT packets, instead of prioritizing them as prior work.
...