Log Structured Storage Engines
from Designing Data Intensive Applications Book Chapter 3
Storage and Retrieval
So, there are two main types of storage engines I want to talk about:
We're gonna talk mainly in this article about Log-structured storage engines.
Log-structured is crazy fast for writes, but if you don’t have indexes, reads are O(n)—not great.
Indexes are basically metadata about the original data that make queries way faster. for example, Hash index is in memory Key-value stores holding The offset inside a segment for a certain key (for a key-value log structured data).
But, as always, there’s a tradeoff: indexes speed up reads, but slow down writes, because every write now means updating the index, too.
An example of log structured storage is Bitcask which is the storage engine of Riak.
Merging and Compaction in Log-Structured Storage: Why They Matter
Modern databases often use a log-structured approach for storage, where all writes are appended sequentially to disk rather than updating data in place. This design brings significant benefits for write performance and crash recovery, as sequential writes are faster and more reliable than random disk updates. However, it also introduces new challenges: over time, the log accumulates multiple versions of the same key, as well as records for deleted data. Without management, this leads to wasted disk space and slower read performance.
What Are Merging and Compaction?
Merging and compaction are background processes that periodically reorganize storage in log-structured systems. Here’s how they work:
For example, consider these two log segments before compaction:
segment1: [Key1: 123 | Key2: 45 | Key1: 7 | Key3: 8]
segment2: [Key2: 9 | Key1: 6 | Key3: 11 | Key1: 7]
After compaction, only the most recent values remain:
final segment: [Key1: 7 | Key2: 9 | Key3: 11]
Why Are They Important?
In a log-structured storage engine, merging and compaction are essential for several reasons:
The Log-Structured Tradeoff
The log-structured approach is optimized for fast, high-throughput writes, but it trades away the ability to update data in place. Without merging and compaction, the very advantage of fast writes would eventually be lost to data bloat and slow queries. Compaction is the mechanism that makes the log-structured approach viable for real-world, ever-changing data.
In summary: Merging and compaction are the housekeeping processes that keep log-structured databases efficient and high-performing, ensuring that the simplicity and speed of append-only writes don’t come at the cost of bloated storage or sluggish reads .
After Compaction and merging, each segment, now has its own in memory hash Table which makes reads faster.
Role of the Hash Table in Log-Structured Storage:
The hash table is the bridge between the write-optimized log structure and the need for fast, efficient reads. It enables log-structured storage engines (like Bitcask) to combine high write throughput with low-latency key lookups, making them practical for real-world workloads.
A hash table serves as an in-memory index that maps each key to the location (offset) of its most recent value in the log file or segment. Since data is never updated in place, the log may contain multiple versions of a key. The hash table always points to the latest version.
Without a hash table, reading a value would require scanning the entire log (O(n)), which is slow and inefficient. With a hash table, you can instantly find where the latest value for a key is stored, turning reads into O(1) operations (provided the hash table fits in memory).
Additional Design Considerations in Log-Structured Systems
Besides compaction and merging, there are several other critical aspects that must be addressed in real-world log-structured storage engines:
only one writer Thread is allowed. and multiple read Threads are allowed.
Concurrency and Crash Recovery are generally simpler while using sequential writes because we don't have to worry about a case when Crash happened while a value was being overwritten leaving you with a file having part of the old and part of the new Value.
Hash Table Index limitations: