MCC-DB: Minimizing Cache Conflicts in Multi-core Processors for Databases

@article{Lee2009MCCDBMC,
  title={MCC-DB: Minimizing Cache Conflicts in Multi-core Processors for Databases},
  author={Rubao Lee and Xiaoning Ding and Feng Chen and Qingda Lu and Xiaodong Zhang},
  journal={Proc. VLDB Endow.},
  year={2009},
  volume={2},
  pages={373-384}
}
In a typical commercial multi-core processor, the last level cache (LLC) is shared by two or more cores. Existing studies have shown that the shared LLC is beneficial to concurrent query processes with commonly shared data sets. However, the shared LLC can also be a performance bottleneck to concurrent queries, each of which has private data structures, such as a hash table for the widely used hash join operator, causing serious cache conflicts. We show that cache conflicts on multi-core… 

Figures and Tables from this paper

CARIC-DA: Core Affinity with a Range Index for Cache-Conscious Data Access in a Multicore Environment

TLDR
This paper proposes CARIC-DA, middleware for achieving higher performance in DBMSs on multicore processors by reducing cache misses with a new cache-conscious dispatcher for concurrent queries, and implemented a prototype that uses unmodified existing Linux and PostgreSQL environments.

Cache-Conscious Data Access for DBMS in Multicore Environments

TLDR
This paper proposes CARIC-DA, middleware for achieving higher performance in DBMSs on multicore processors, by reducing cache misses with a new cache-conscious dispatcher for concurrent queries, and evaluated the effectiveness of the proposal on three different multicore platforms.

W-Order Scan: Minimizing Cache Pollution by Application Software Level Cache Management for MMDB

TLDR
The experimental results show that DBMSs can improve cache performance through controlling weak locality data accessing pattern by themselves oppose to depending on supports by hardware or OS.

Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures

TLDR
The proposed scheme distributes a given batch of queries across the cores of a target multicore architecture based on the affinity relations among the queries, to maximize the utilization of the underlying on-chip cache hierarchy while keeping the load nearly balanced across domain affinities.

Scaling Up Concurrent Analytical Workloads on Multi-Core Servers

TLDR
It is argued that sharing and NUMA-awareness are key factors for supporting faster processing of big data analytical applications, fully exploiting the hardware resources of modern multi-core servers, and for more responsive user experience.

ULCC: a user-level facility for optimizing shared cache performance on multicores

TLDR
ULCC (User Level Cache Control), a software runtime library that enables programmers to explicitly manage and optimize last level cache usage by allocating proper cache space for different data sets of different threads, is implemented at the user level based on a page-coloring technique for lastlevel cache usage management.

Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning

TLDR
Experimental results show that in comparison with a standard L2 cache managed by LRU, Soft-OLP significantly reduces the execution time by reducing L1 cache misses across inputs for a set of single- and multi-threaded programs from the SPEC CPU2000 benchmark suite, NAS benchmarks and a computational kernel set.

MOSS-DB: A Hardware-Aware OLAP Database

TLDR
A hardware-aware OLAP model named MOSS-DB which optimizes storage model according to data access features of dimensional tables and fact tables and it outperforms conventional DRDB system, and it also outperforms MMDB in SSB testing.

Accelerating Concurrent Workloads with CPU Cache Partitioning

TLDR
This work devise a cache allocation scheme from an empirical analysis of different operators and integrate a cache partitioning mechanism into the execution engine of a commercial DBMS and demonstrates that this approach improves the overall system performance.

A Pressure-Aware Policy for Contention Minimization on Multicore Systems

TLDR
This work forms a fine-grained application characterization methodology that leverages Performance Monitoring Counters (PMCs) and Cache Monitoring Technology (CMT) in Intel processors to develop two contention-aware scheduling policies that co-schedule applications based on their resource-interference profiles.
...

References

SHOWING 1-10 OF 35 REFERENCES

Main-memory scan sharing for multi-core CPUs

TLDR
This work proposes a novel FullSharing scheme that allows all concurrent queries, when performing base-table I/O, to share the cache belonging to a given core, and uses lottery-scheduling techniques to ensure fairness and impose a hard upper bound on staging time to avoid starvation.

An analysis of database workload performance on simultaneous multithreaded processors

TLDR
Examining database performance on SMT processors using traces of the Oracle database management system characterizes the memory-system behavior of database systems running on-line transaction processing and decision support system workloads and shows that SMT's latency tolerance is highly effective for database applications.

Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning

TLDR
Experimental results show that in comparison with a standard L2 cache managed by LRU, Soft-OLP significantly reduces the execution time by reducing L1 cache misses across inputs for a set of single- and multi-threaded programs from the SPEC CPU2000 benchmark suite, NAS benchmarks and a computational kernel set.

Database Architecture Optimized for the New Bottleneck: Memory Access

TLDR
A simple scan test is used to show the severe impact of main-memory access bottleneck, and radix algorithms for partitioned hash-join are introduced, using a detailed analytical model that incorporates memory access cost.

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand

Buffering databse operations for enhanced instruction cache performance

TLDR
This work answers the question "Why does a database system incur so many instruction cache misses" and proposes techniques to buffer database operations during query execution to avoid instruction cache thrashing.

Generic Database Cost Models for Hierarchical Memory Systems

Karma: Know-It-All Replacement for a Multilevel Cache

TLDR
Karma is presented, a global non-centralized, dynamic and informed management policy for multiple levels of cache that leverages application hints to make informed allocation and replacement decisions in all cache levels, preserving exclusive caching and adjusting to changes in access patterns.

Buffering Accesses to Memory-Resident Index Structures

DBMSs on a Modern Processor: Where Does Time Go?

TLDR
This paper examines four commercial DBMSs running on an Intel Xeon and NT 4.0 and introduces a framework for analyzing query execution time, and finds that database developers should not expect the overall execution time to decrease significantly without addressing stalls related to subtle implementation issues.