EASY: Efficient Segment Assignment Strategy for Reducing Tail Latencies in Pinot

@article{Javadi2018EASYES,
  title={EASY: Efficient Segment Assignment Strategy for Reducing Tail Latencies in Pinot},
  author={Seyyed Ahmad Javadi and Harsh Gupta and Robin Manhas and Shweta Sahu and Anshul Gandhi},
  journal={2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)},
  year={2018},
  pages={1432-1437}
}
Customer facing online services, such as LinkedIn and Uber, rely on scalable and low-latency data stores to maintain acceptable query tail latencies. An important challenge for managing the performance of these systems is the assignment of newly created data segments to data nodes to balance load. Given the rate at which these services are accessed (thus generating new data), the segment assignment problem is particularly important. This paper presents EASY, an efficient segment assignment… 
1 Citations

Figures from this paper

Analytical Approaches for Dynamic Scheduling in Cloud Environments

TLDR
Syyed Ahmad Javadi's current research involves addressing audit logging challenges in IoT environments, online social platforms, and cloud computing environments.

References

SHOWING 1-10 OF 13 REFERENCES

Experimental evidence on partitioning in parallel data warehouses

TLDR
This paper shows experimentally a simple and easy-to-apply partitioning and placement decision that achieves good performance improvement results and identifies simple modifications that can minimize such undesirable extra overheads.

VOLAP: A Scalable Distributed System for Real-Time OLAP with High Velocity Data

This paper presents VelocityOLAP (VOLAP), a distributed real-time OLAP system for high velocity data. VOLAP makes use of dimension hierarchies, is highly scalable, and exploits both multi-core and

STeP: Scalable Tenant Placement for Managing Database-as-a-Service Deployments

TLDR
This paper studies tenant placement by examining a publicly released dataset of anonymized customer resource usage statistics from Microsoft's Azure SQL Database production system over a three-month period, using the STeP framework to ingest and analyze this large dataset.

Data placement in Bubba

TLDR
It is argued that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustered, for minimizing the most significant overheads associated with large-scale declusting, and for gathering the required statistics.

BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores

TLDR
BlowFish, a distributed data store that admits a smooth tradeoff between storage and performance for point queries, is presented and it is shown that navigating the storage-performance tradeoff achieves higher system-wide utility than selectively caching hot objects.

The Multi-Tenant Data Placement Problem

TLDR
This paper proposes the formalization for the problem of assigning “tenants” (i.e. the customers) to servers of an on-demand database cluster in the form of an optimization problem, omitting database specifics and thus presenting the problem in an abstract fashion.

Druid: a real-time analytical data store

TLDR
Druid's architecture is described, and how it supports fast aggregations, flexible filters, and low latency data ingestion is detailed.

Workload-aware storage layout for database systems

TLDR
This paper addresses the problem of generating an optimized layout of a given set of database objects as a non-linear programming (NLP) problem and uses the I/O description as input to an NLP solver to identify non-trivial optimized layouts.

Distributing a database for parallelism

TLDR
The concept of "local sufficiency" is introduced as a measure of parallelism, and it is shown how certain classes of queries lead naturally to irredundant partitions of a database that are locally sufficient.

Workload-aware database monitoring and consolidation

TLDR
Kairos formalizes the consolidation problem as a non-linear optimization program, aiming to minimize the number of servers and balance load, while achieving near-zero performance degradation.