Log Structured Storage Engines

Log Structured Storage Engines

from Designing Data Intensive Applications Book Chapter 3

Storage and Retrieval

So, there are two main types of storage engines I want to talk about:

  • log structured - data gets appended To the end of a file.
  • Page oriented such as B-Trees.

We're gonna talk mainly in this article about Log-structured storage engines.

Log-structured is crazy fast for writes, but if you don’t have indexes, reads are O(n)—not great.

Indexes are basically metadata about the original data that make queries way faster. for example, Hash index is in memory Key-value stores holding The offset inside a segment for a certain key (for a key-value log structured data).

But, as always, there’s a tradeoff: indexes speed up reads, but slow down writes, because every write now means updating the index, too.

An example of log structured storage is Bitcask which is the storage engine of Riak.

Merging and Compaction in Log-Structured Storage: Why They Matter

Modern databases often use a log-structured approach for storage, where all writes are appended sequentially to disk rather than updating data in place. This design brings significant benefits for write performance and crash recovery, as sequential writes are faster and more reliable than random disk updates. However, it also introduces new challenges: over time, the log accumulates multiple versions of the same key, as well as records for deleted data. Without management, this leads to wasted disk space and slower read performance.

What Are Merging and Compaction?

Merging and compaction are background processes that periodically reorganize storage in log-structured systems. Here’s how they work:

  • Merging: Combines several log segments (files) into a single, larger segment.
  • Compaction: During the merge, the system discards outdated or deleted values, keeping only the latest version for each key.

For example, consider these two log segments before compaction:

segment1:  [Key1: 123 | Key2: 45 | Key1: 7 | Key3: 8]
segment2:  [Key2: 9 | Key1: 6 | Key3: 11 | Key1: 7]        

After compaction, only the most recent values remain:

final segment:  [Key1: 7 | Key2: 9 | Key3: 11]        

  • another Technique for avoiding Running out of disk space is to set Certain size for storage segments, when a segment reach Certain size we put the subsequent writes in new segment.
  • another Technique is to merge Compacted segments into one document, while being merged the reads and writes, can still be served but from the old unmerged segments, and when the merge is done we switch Reads and writes to The new segment and delete the old segments.

Why Are They Important?

In a log-structured storage engine, merging and compaction are essential for several reasons:

  • Efficiency: Old, overwritten, or deleted records are removed, freeing up disk space and keeping storage lean.
  • Performance: By reducing the number of duplicate and obsolete entries, lookups become faster and require scanning less data.
  • it also guarantees continuous operation, Compaction runs in the background, so reads and writes can continue against the old segments until the process finishes. Once complete, the system seamlessly switches to the newly compacted segment.

The Log-Structured Tradeoff

The log-structured approach is optimized for fast, high-throughput writes, but it trades away the ability to update data in place. Without merging and compaction, the very advantage of fast writes would eventually be lost to data bloat and slow queries. Compaction is the mechanism that makes the log-structured approach viable for real-world, ever-changing data.

In summary: Merging and compaction are the housekeeping processes that keep log-structured databases efficient and high-performing, ensuring that the simplicity and speed of append-only writes don’t come at the cost of bloated storage or sluggish reads .

After Compaction and merging, each segment, now has its own in memory hash Table which makes reads faster.

Role of the Hash Table in Log-Structured Storage:

The hash table is the bridge between the write-optimized log structure and the need for fast, efficient reads. It enables log-structured storage engines (like Bitcask) to combine high write throughput with low-latency key lookups, making them practical for real-world workloads.

A hash table serves as an in-memory index that maps each key to the location (offset) of its most recent value in the log file or segment. Since data is never updated in place, the log may contain multiple versions of a key. The hash table always points to the latest version.

Without a hash table, reading a value would require scanning the entire log (O(n)), which is slow and inefficient. With a hash table, you can instantly find where the latest value for a key is stored, turning reads into O(1) operations (provided the hash table fits in memory).

Additional Design Considerations in Log-Structured Systems

Besides compaction and merging, there are several other critical aspects that must be addressed in real-world log-structured storage engines:

  1. Deleting Records Deletions don’t remove data immediately. Instead, a tombstone record is appended to the log, marking the key as deleted. During the merge process, these tombstones help the system discard any values associated with deleted keys.
  2. Crash Recovery To avoid scanning the entire log on every restart to re-construct the hash-table of the index, systems periodically persist a snapshot of the in-memory hash table index to disk. This allows faster recovery and initialization after a crash.
  3. Concurrency Control Log-structured systems typically simplify concurrency by the following:

only one writer Thread is allowed. and multiple read Threads are allowed.

Concurrency and Crash Recovery are generally simpler while using sequential writes because we don't have to worry about a case when Crash happened while a value was being overwritten leaving you with a file having part of the old and part of the new Value.

Hash Table Index limitations:

  • hash Tables must fit in memory so, if we have a lot of Keys to be indexed this becomes a challenge.
  • Range queries are not efficient so, if we want to scan over all Keys between jmi000 and jmi999, you'd have to lookup each Key in that range individually in the in memory hash maps.
  • and this is a rule of thumb Hash Index is great for locating an exact value but not for range queries.

To view or add a comment, sign in

More articles by Muhammad Hamed

Explore content categories