The log-structured merge-tree (LSM-tree)

@article{ONeil2009TheLM,
  title={The log-structured merge-tree (LSM-tree)},
  author={Patrick E. O'Neil and Edward Y. C. Cheng and Dieter Gawlick and Elizabeth J. O'Neil},
  journal={Acta Informatica},
  year={2009},
  volume={33},
  pages={351-385}
}
High-performance transaction system applications typically insert rows in a History table to provide an activity trace; at the same time the transaction system generates log records for purposes of system recovery. Both types of generated information can benefit from efficient indexing. An example in a well-known setting is the TPC-A benchmark application, modified to support efficient queries on the history for account activity for specific accounts. This requires an index by account-id on the… 
Efficient Key-Value Stores with Ranged Log-Structured Merge Trees
TLDR
To reduce the write amplification and memory overhead, RLSM simplifies the logical layout of storage and keeps data as an unsorted order and prevents read performance from declining by partitioning data on the disk into multiple files with non-overlapping ranges.
Incremental join view maintenance on distributed log-structured storage
TLDR
The design space is examined and several design features for the implementation of a view on a distributed log-structured merge-tree (LSM-tree) are concluded, which is a well-known structure for improving data write performance.
Incremental Materialized View Maintenance on Distributed Log-Structured Merge-Tree
TLDR
This paper develops materialized views on a distributed log-structured merge-tree (LSM-tree), which is a well-known structure adopted to improve data write performance and achieves better performance than straightforward methods on different workloads.
Precise Data Access on Distributed Log-Structured Merge-Tree
TLDR
This work proposes the precise data access strategy for log-structured merge tree: a Bloom filter-based structure is designed to test whether an element exists in the in-writing part of the LSM-tree and a lease-based synchronization strategy is used to maintain consistent copies of the Bloom filter on remote query servers.
A Comparative Study of Log-Structured Merge-Tree-Based Spatial Indexes for Big Data
TLDR
This paper implements five variants of disk-resident spatial indexing methods in the form of Log-Structured Merge-tree-based (LSM) spatial indexes in order to evaluate their pros and cons for dynamic geo-tagged Big Data, and implemented the alternatives in Apache AsterixDB, an open source Big Data management system.
The LSM RUM-Tree: A Log Structured Merge R-Tree for Update-intensive Spatial Workloads
TLDR
The LSM RUM-tree introduces novel strategies to reduce the size of the Update Memo to be a light-weight in-memory structure that is suitable for handling update-intensive workloads without introducing significant over-head.
NIOSIT: efficient data access for log-structured merge-tree style storage systems
TLDR
A new network request processing mechanism is designed to allow data access to be processed in an auxiliary lightweight network communication IO thread and a Bloom filter is incorporated with the network IO thread to effectively filter out the empty reads.
Partition pruning for range query on distributed log-structured merge-tree
TLDR
A partition pruning strategy to save cost for range queries is proposed and a version-based cache synchronization strategy is proposed to ensure the queries to obtain the latest data state are guaranteed.
Fault-tolerant precise data access on distributed log-structured merge-tree
TLDR
This work proposes a precise data access strategy which includes an efficient structure with low maintaining overhead designed to test whether a record exists in the in-writing part of the LSM-tree; a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers.
Deferred Lightweight Indexing for Log-Structured Key-Value Stores
TLDR
DELI is presented, a DEferred Lightweight Indexing scheme on the log-structured key-value stores that optimizes the performance of index garbage collection through tightly coupling its execution with a native routine process called compaction.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 38 REFERENCES
TheSB-tree an index-sequential structure for high-performance sequential access
  • P. O'Neil
  • Computer Science
    Acta Informatica
  • 2005
TLDR
A performance analysis formulates a new useful concept, the ‘effective depth’ of anSB- orB+-tree, defined as the expected number of pages read from disk to perform a random retrieval search given standard buffering behavior, and a graph of effective depth against tree size is shown to have a scalloped appearance, reflecting the changing effectiveness of incremental additions to buffer space.
A Log-Structured History Data Access Method (LHAM)
TLDR
This paper introduces a new access method for history data, called the Log–structured History data Access Method (LHAM), to partition the data into successive components based on the timestamps of the record versions, and to employ a rolling merge process for efficient data migration between components.
The design and implementation of a log-structured file system
TLDR
A prototype log-structured file system called Sprite LFS is implemented; it outperforms current Unix file systems by an order of magnitude for small-file writes while matching or exceeding Unix performance for reads and large writes.
The performance of a multiversion access method
TLDR
Using both analysis and simulation, the amount of redundancy, the space utilization, and the record addition (insert or update) performance for a spectrum of different rates of insertion versus update are characterised.
Selective Deferred Index Maintenance & Concurrency Control in Integrated Information Systems
TLDR
How the time for those transactions, which cause text index up-dates, can be shortened by integrating a dedicated predicate-oriented concurrency control method and a selective deferred index update strategy is discussed.
Distributed logging for transaction processing
TLDR
It is argued that a high performance, microprocessor based processing node can support a log server if it uses efficient communication protocols and low latency, non volatile storage to buffer log data.
A simple bounded disorder file organization with good performance
TLDR
This paper presents two important improvements to the bounded-disorder organization as originally described, which has utilization, random access performance, and file growth performance that can be competitive with good extendible hashing methods, while supporting high-performance sequential access.
The Escrow transactional method
TLDR
The Escrow Method offered here is designed to support nonblocking record updates by transactions that are “long lived” and thus require long periods to complete, and several advantages result.
Concurrency of operations on B-trees
TLDR
It is concluded that B-trees can be used advantageously in a multi-user environment because the solution presented here uses simple locking protocols which can be tuned to specific requirements.
Transaction Processing: Concepts and Techniques
TLDR
Using transactions as a unifying conceptual framework, the authors show how to build high-performance distributed systems and high-availability applications with finite budgets and risk.
...
1
2
3
4
...