LSM-Tree Indexing , Returns to basic and move forward

LSM-Tree Indexing , Returns to basic and move forward

Motivation

Knowing how your database stores and retrieve data and how it manages these operations with association with your storage medium is essential if you are working on heavy and daily used domain, and you may surprised how a small and tiny details can improve or backfire your performance, so we will try to discuss these details with consider different ways of thinking.

Return to basics

A database in simple words a place that you can store data in and retrieve data from.

So starting from this concept let's try to implement a small DB (store & retrieve)

As a start we will have an empty file called db.txt

then let's say we will start by use 2 commands/methods to deal with this file as DB, such as we have a command/method that takes 2 arguments key and value(json object / CSV record / single value)

  append("112233","{"name" :"ahmed","age" : 15}");
  
  append("57587","{"name" : "eslam","age" : 20 }");

          

this will resulting these writes on file as below

No alt text provided for this image

And another command/method takes one argument as a key and start scanning every line and looking for that key, if found return it's value

  find("112233");        

And this will retrieve the value associated with that key

  {"name" :"ahmed","age" : 15}        

Yeah, we can say now we have a database supports insertion and retrieval, but what about update and delete.

In case of delete we can add another method that takes one argument as key and scanning the file when find the key add a sign or special character(let's use explanation mark) that indicates this record is deleted so it can be ignored while other operations.

  delete("112233");         

And this is the result

No alt text provided for this image

And when it come to update operation to keep our database simple we will design our update to insert same key but with updated value like this

  append("112233","{"name" :"ahmed muhammed","age" : 15}");        
No alt text provided for this image

And regarding to this update operation we have to modify the retrieve operation to return the most recent value of specific key not first occurrence.

Let's do some enhancement

in order to enhance retrieval we can add a Map data structure to work as an index for our DB which its' key is the key of the row and its' value is the recent offset of the key in our file, so instead of scanning just jump to the offset and start reading.

  57587,15
  112233,30        

We currently have our database that supports CRUD operations, not very efficient but it works!

Off course this has a lot of limitations as it can't serve range queries, not efficient regarding to retrieving data, waste space regarding to duplicate rows, have to add a sign/signal for deleted rows and concurrency issues.

So in order to solve theses issues let's move forward with this

reduce duplication and save space

in order to solve duplication we need a process that runs behind the scene and takes our DB file and merge duplicate keys into one distinct key with most recent value to free space, so we have now what called compacting process, and in order to not run this operation up on large data we will split our DB file into smaller segments and run compacting operation continuously as below

No alt text provided for this image

The result of this operation is new enhanced DB file contains distinct keys with recent values.

Enhance retrieve performance

We all know that searching in sorted data is much cheaper than unsorted one, so let's try to sort our data by key, while compacting operation is running we can apply merge sort algorithm to generate a new sorted by key file with most recent values from that generated from write operation as below

No alt text provided for this image


By this we sorted our data lazily with compacting operation, but we can build it eagerly by having an in-memory tree data structure that holds our writes (insert rows first in it), and by nature tree nodes are sorted so it can be written directly to disk when it reach a pre defined size/threshold, when multiple files generated from writing operations then run compacting operation to free space and remove duplicates between files. But this way has risk regarding that write operation kept in memory first, which may be lost when OS crash or any failure happens, so to solve this we can write our data also to disk (unsorted) beside tree and whenever crash happens restore the tree from this log and can be discarded when tree written to disk.

With our sorted data, we can easily build small in-memory index that not holding all the keys. Instead, it have only the start position of group of similar keys and scan the exact one by iterating over them (as phone book go to letter S and find Samy)

No alt text provided for this image

That is about write operation.

When it comes to read operation we first try to search in index level of not found try to search in most recent file and older and older and so on.

Regarding to concurrency with multiple reads and writes, writes has its' own path thar write to tree whenever reach to threshold write to disk then run compacting process behind the scene. and reads can be served from the original segments that generated earlier and whenever compacting operation done then direct read operation to newly generated segment/file

No alt text provided for this image

The benefits we gain till now

  • save space
  • enhance retrieve operation
  • we can serve range query in a way or another
  • handle concurrency

What we built till now known by SSTable (Sorted Strings Table) that resides on disk with an small in-memory index known by MemTable to enhance searching. And the whole process known by LSM-TREE (Log-Structured Merge-Tree) a type of indexing mechanism which steps as below

1 - log your writes into in-memory tree data structure (MemTable).

2 - when tree reaching a pre defined size/threshold write it on disk as segments.

3 - After pre defined amount of time run compacting operation to merge segments

4 - Repeat.

Why Update operation inserting a new row not updating the existing one ?

Well, not all databases do this some use what called Immutable data when comes to update just insert new row instead of modifying the existed one like (Postgres) and another databases update the existing one but keeps trach of history on what called Undo logs like MySQL and this regarding to dealing with concurrency (MVCC), isolation and rolling back. We will introduce these in another articles but I will list some resources for that

Resources

  • Chapter 3 : Storage and Retrieval from Design Data-Intensive applications


To view or add a comment, sign in

More articles by Sameh Muhammed

  • كيف تدير سؤال ما هو الراتب المتوقع؟.. إستراتيجيات للتفاوض على المرتب

    مقدمة من خلال ٧ سنوات عمل تنقلت ما بين ٧ شركات، واستقبلت اكتر من ٢٠ عرض عمل من مختلف الشركات في مختلف الظروف المادية،…

  • The Idea Behind Quarkus

    Introduction If you are a Java developer and using a Spring Boot framework, you can see that a lot of videos and…

    2 Comments
  • Legacy Code: I can't get this class into test suite

    المقدمة في المقال اللي فات اتكلمنا عن الأساليب الآمنة اللي ممكن استخدمها واحنا بنضيف feature جديدة في النظام القديم وفي…

  • Legacy Code: How do I add a feature

    المقدمة في المقال السابق اتكلمنا عن كيفيه عمل التغييرات لما بيكون الوقت ضيق ومحتاجين يكون التغيير في وقت سريع ومفيش وقت…

  • Legacy Code: I don't have much time and I need the change quickly

    المقدمة اتكلمنا في المقال اللي فات عن ال Seam وعن ليه أفضل حل في التعامل مع ال legacy code هو ال unit tests. هنتكلم هنا…

    4 Comments
  • Legacy Code: The Seam & Feedback Loop

    المقدمة في المقالة السابقة اتكلمنا عن الهدف الأساسي من السوفت وير وأنواع التغييرات اللي ممكن تحصل على السوفت وير وتعريف…

  • Legacy Code: Mechanics of change

    المقدمة أي سوفت وير موجود حاليا بتكون قيمته في حجم المميزات اللي بيضيفها في حياة الناس وثقة الناس في السوفت وير دا يعني…

    1 Comment
  • Service Design Best Practices - Part II

    Introduction By now most softwares separating the server side logic from user interface logic by developing a back end…

  • Service Design Best Practices - Part I

    Introduction By now most softwares separating the server side logic from user interface logic by developing a back end…

  • حقيقة الشغف : الطرق العمليه للوصول إليه

    المقدمة هل صادفت يوما أحد هذه النصائح على مواقع التواصل الإجتماعي أو أخبرك بها أحد من أصدقائك عليك أن تتبع شغفك إذا…

Others also viewed

Explore content categories