The log-structured merge-tree (LSM-tree)

  title={The log-structured merge-tree (LSM-tree)},
  author={Patrick E. O'Neil and Edward Y. C. Cheng and Dieter Gawlick and Elizabeth J. O'Neil},
  journal={Acta Informatica},
High-performance transaction system applications typically insert rows in a History table to provide an activity trace; at the same time the transaction system generates log records for purposes of system recovery. Both types of generated information can benefit from efficient indexing. An example in a well-known setting is the TPC-A benchmark application, modified to support efficient queries on the history for account activity for specific accounts. This requires an index by account-id on the… 

Bi-directional Log-Structured Merge Tree

The Bi-directional L SM-tree is proposed, which differs from the classical LSM-tree in that hot records can move to higher levels to improve the overall LSM organization and benefit future range queries.

An LSM-Tree Index for Spatial Data

By building an SER-tree(embedded R-tree on an SSTable) index structure for each storage component, this paper optimised dual index queries into single and organisedSER-tree indexes into an ER-tree index with a binary linked list and showed that the query performance of the ER- Tree index was effectively improved compared to that of stand-alone R-Tree indexes.

Efficient Key-Value Stores with Ranged Log-Structured Merge Trees

To reduce the write amplification and memory overhead, RLSM simplifies the logical layout of storage and keeps data as an unsorted order and prevents read performance from declining by partitioning data on the disk into multiple files with non-overlapping ranges.

Incremental join view maintenance on distributed log-structured storage

The design space is examined and several design features for the implementation of a view on a distributed log-structured merge-tree (LSM-tree) are concluded, which is a well-known structure for improving data write performance.

Incremental Materialized View Maintenance on Distributed Log-Structured Merge-Tree

This paper develops materialized views on a distributed log-structured merge-tree (LSM-tree), which is a well-known structure adopted to improve data write performance and achieves better performance than straightforward methods on different workloads.

Precise Data Access on Distributed Log-Structured Merge-Tree

This work proposes the precise data access strategy for log-structured merge tree: a Bloom filter-based structure is designed to test whether an element exists in the in-writing part of the LSM-tree and a lease-based synchronization strategy is used to maintain consistent copies of the Bloom filter on remote query servers.

A Comparative Study of Log-Structured Merge-Tree-Based Spatial Indexes for Big Data

This paper implements five variants of disk-resident spatial indexing methods in the form of Log-Structured Merge-tree-based (LSM) spatial indexes in order to evaluate their pros and cons for dynamic geo-tagged Big Data, and implemented the alternatives in Apache AsterixDB, an open source Big Data management system.

The LSM RUM-Tree: A Log Structured Merge R-Tree for Update-intensive Spatial Workloads

The LSM RUM-tree introduces novel strategies to reduce the size of the Update Memo to be a light-weight in-memory structure that is suitable for handling update-intensive workloads without introducing significant over-head.

NIOSIT: efficient data access for log-structured merge-tree style storage systems

A new network request processing mechanism is designed to allow data access to be processed in an auxiliary lightweight network communication IO thread and a Bloom filter is incorporated with the network IO thread to effectively filter out the empty reads.

Partition pruning for range query on distributed log-structured merge-tree

A partition pruning strategy to save cost for range queries is proposed and a version-based cache synchronization strategy is proposed to ensure the queries to obtain the latest data state are guaranteed.



TheSB-tree an index-sequential structure for high-performance sequential access

  • P. O'Neil
  • Computer Science
    Acta Informatica
  • 2005
A performance analysis formulates a new useful concept, the ‘effective depth’ of anSB- orB+-tree, defined as the expected number of pages read from disk to perform a random retrieval search given standard buffering behavior, and a graph of effective depth against tree size is shown to have a scalloped appearance, reflecting the changing effectiveness of incremental additions to buffer space.

A Log-Structured History Data Access Method (LHAM)

This paper introduces a new access method for history data, called the Log–structured History data Access Method (LHAM), to partition the data into successive components based on the timestamps of the record versions, and to employ a rolling merge process for efficient data migration between components.

The design and implementation of a log-structured file system

A prototype log-structured file system called Sprite LFS is implemented; it outperforms current Unix file systems by an order of magnitude for small-file writes while matching or exceeding Unix performance for reads and large writes.

The performance of a multiversion access method

Using both analysis and simulation, the amount of redundancy, the space utilization, and the record addition (insert or update) performance for a spectrum of different rates of insertion versus update are characterised.

Selective Deferred Index Maintenance & Concurrency Control in Integrated Information Systems

How the time for those transactions, which cause text index up-dates, can be shortened by integrating a dedicated predicate-oriented concurrency control method and a selective deferred index update strategy is discussed.

Distributed logging for transaction processing

It is argued that a high performance, microprocessor based processing node can support a log server if it uses efficient communication protocols and low latency, non volatile storage to buffer log data.

A simple bounded disorder file organization with good performance

This paper presents two important improvements to the bounded-disorder organization as originally described, which has utilization, random access performance, and file growth performance that can be competitive with good extendible hashing methods, while supporting high-performance sequential access.

The Escrow transactional method

The Escrow Method offered here is designed to support nonblocking record updates by transactions that are “long lived” and thus require long periods to complete, and several advantages result.

Concurrency of operations on B-trees

It is concluded that B-trees can be used advantageously in a multi-user environment because the solution presented here uses simple locking protocols which can be tuned to specific requirements.

Transaction Processing: Concepts and Techniques

Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk.