LSM-Tree Indexing , Returns to basic and move forward

Sameh Muhammed

Published Jan 29, 2022

Motivation

Knowing how your database stores and retrieve data and how it manages these operations with association with your storage medium is essential if you are working on heavy and daily used domain, and you may surprised how a small and tiny details can improve or backfire your performance, so we will try to discuss these details with consider different ways of thinking.

Return to basics

A database in simple words a place that you can store data in and retrieve data from.

So starting from this concept let's try to implement a small DB (store & retrieve)

As a start we will have an empty file called db.txt

then let's say we will start by use 2 commands/methods to deal with this file as DB, such as we have a command/method that takes 2 arguments key and value(json object / CSV record / single value)

  append("112233","{"name" :"ahmed","age" : 15}");
  
  append("57587","{"name" : "eslam","age" : 20 }");

this will resulting these writes on file as below

And another command/method takes one argument as a key and start scanning every line and looking for that key, if found return it's value

  find("112233");

And this will retrieve the value associated with that key

  {"name" :"ahmed","age" : 15}

Yeah, we can say now we have a database supports insertion and retrieval, but what about update and delete.

In case of delete we can add another method that takes one argument as key and scanning the file when find the key add a sign or special character(let's use explanation mark) that indicates this record is deleted so it can be ignored while other operations.

  delete("112233");

And this is the result

And when it come to update operation to keep our database simple we will design our update to insert same key but with updated value like this

  append("112233","{"name" :"ahmed muhammed","age" : 15}");

And regarding to this update operation we have to modify the retrieve operation to return the most recent value of specific key not first occurrence.

Let's do some enhancement

in order to enhance retrieval we can add a Map data structure to work as an index for our DB which its' key is the key of the row and its' value is the recent offset of the key in our file, so instead of scanning just jump to the offset and start reading.

  57587,15
  112233,30

We currently have our database that supports CRUD operations, not very efficient but it works!

Off course this has a lot of limitations as it can't serve range queries, not efficient regarding to retrieving data, waste space regarding to duplicate rows, have to add a sign/signal for deleted rows and concurrency issues.

So in order to solve theses issues let's move forward with this

Recommended by LinkedIn

#3 - GenAI for PMs: Using AI and MCP Servers for Data…

Harikrishnan M B 1 month ago

The case for an irrational database

Martijn Aslander 1 year ago

Filtering with ON vs. WHERE

Moshe Shamouilian 1 year ago

reduce duplication and save space

in order to solve duplication we need a process that runs behind the scene and takes our DB file and merge duplicate keys into one distinct key with most recent value to free space, so we have now what called compacting process, and in order to not run this operation up on large data we will split our DB file into smaller segments and run compacting operation continuously as below

The result of this operation is new enhanced DB file contains distinct keys with recent values.

Enhance retrieve performance

We all know that searching in sorted data is much cheaper than unsorted one, so let's try to sort our data by key, while compacting operation is running we can apply merge sort algorithm to generate a new sorted by key file with most recent values from that generated from write operation as below

By this we sorted our data lazily with compacting operation, but we can build it eagerly by having an in-memory tree data structure that holds our writes (insert rows first in it), and by nature tree nodes are sorted so it can be written directly to disk when it reach a pre defined size/threshold, when multiple files generated from writing operations then run compacting operation to free space and remove duplicates between files. But this way has risk regarding that write operation kept in memory first, which may be lost when OS crash or any failure happens, so to solve this we can write our data also to disk (unsorted) beside tree and whenever crash happens restore the tree from this log and can be discarded when tree written to disk.

With our sorted data, we can easily build small in-memory index that not holding all the keys. Instead, it have only the start position of group of similar keys and scan the exact one by iterating over them (as phone book go to letter S and find Samy)

That is about write operation.

When it comes to read operation we first try to search in index level of not found try to search in most recent file and older and older and so on.

Regarding to concurrency with multiple reads and writes, writes has its' own path thar write to tree whenever reach to threshold write to disk then run compacting process behind the scene. and reads can be served from the original segments that generated earlier and whenever compacting operation done then direct read operation to newly generated segment/file

The benefits we gain till now

save space
enhance retrieve operation
we can serve range query in a way or another
handle concurrency

What we built till now known by SSTable (Sorted Strings Table) that resides on disk with an small in-memory index known by MemTable to enhance searching. And the whole process known by LSM-TREE (Log-Structured Merge-Tree) a type of indexing mechanism which steps as below

1 - log your writes into in-memory tree data structure (MemTable).

2 - when tree reaching a pre defined size/threshold write it on disk as segments.

3 - After pre defined amount of time run compacting operation to merge segments

4 - Repeat.

Why Update operation inserting a new row not updating the existing one ?

Well, not all databases do this some use what called Immutable data when comes to update just insert new row instead of modifying the existed one like (Postgres) and another databases update the existing one but keeps trach of history on what called Undo logs like MySQL and this regarding to dealing with concurrency (MVCC), isolation and rolling back. We will introduce these in another articles but I will list some resources for that

LSM-Tree Indexing , Returns to basic and move forward

Sameh Muhammed

Motivation

Return to basics

Let's do some enhancement

So in order to solve theses issues let's move forward with this

Recommended by LinkedIn

reduce duplication and save space

Enhance retrieve performance

The benefits we gain till now

Why Update operation inserting a new row not updating the existing one ?

Resources

More articles by Sameh Muhammed

Others also viewed

Stop Indexing Just Your WHERE Clause: How I Killed Random I/O

How Databases Make Reads Fast Without Slowing Writes

Types of Keys in Relational Model

Exploring Pagination Techniques in Databases: Offset, Keyset, and Cursor Pagination

Securing AI-Generated SQL in Text-to-SQL Systems Using RBAC - A Practical Approach

How to Build RAG application with Knowledge Graph

Data Indexing - Efficient Searches and Quick Queries in your Bubble App

From Anchor to Loop: Understanding the Mechanics of Recursive CTEs

Storing Data in the Content Server

Seamless Data Import with Pega Service File Rules – Simplifying File Processing for Smarter Workflows!

Explore content categories

Motivation

Return to basics

Let's do some enhancement

So in order to solve theses issues let's move forward with this

Recommended by LinkedIn

reduce duplication and save space

Enhance retrieve performance

The benefits we gain till now

Why Update operation inserting a new row not updating the existing one ?

Resources

More articles by Sameh Muhammed

كيف تدير سؤال ما هو الراتب المتوقع؟.. إستراتيجيات للتفاوض على المرتب

The Idea Behind Quarkus

Legacy Code: I can't get this class into test suite

Legacy Code: How do I add a feature

Legacy Code: I don't have much time and I need the change quickly

Legacy Code: The Seam & Feedback Loop

Legacy Code: Mechanics of change

Service Design Best Practices - Part II

Service Design Best Practices - Part I

حقيقة الشغف : الطرق العمليه للوصول إليه

Others also viewed

Stop Indexing Just Your WHERE Clause: How I Killed Random I/O

How Databases Make Reads Fast Without Slowing Writes

Types of Keys in Relational Model

Exploring Pagination Techniques in Databases: Offset, Keyset, and Cursor Pagination

Securing AI-Generated SQL in Text-to-SQL Systems Using RBAC - A Practical Approach

How to Build RAG application with Knowledge Graph

Data Indexing - Efficient Searches and Quick Queries in your Bubble App

From Anchor to Loop: Understanding the Mechanics of Recursive CTEs

Storing Data in the Content Server

Seamless Data Import with Pega Service File Rules – Simplifying File Processing for Smarter Workflows!

Explore content categories