Primary Key Performance At Scale

Andrew Graaff

Published Jul 10, 2024

In Domain Driven Design we need to create entities that are always in a consistent state. We therefore cannot use Auto Increment or Sequences for their primary key (id). We need a UUID or something which can be generated in process that provides guarantees of uniqueness. In this article we will look at the performance of a few options.onfong

One option is to use SnowFlake Ids. These were popularised by Twitter. Unfortunately they require a server to issue a partition ID or a machine ID to the producer. At most scales this does not seem acceptable.

We will therefore compare the following with auto incrementing big integers.

UUID version 4
UUID version 7
UUID version 7 Base32 encoded
UUID version 7 Base58 encoded
UUID version 7 binary encoded
ULID

UUID version 4

UUIDs all have a format like ‘00000105–932c-4e83–98bd-fd3cbb92006a’. Depending on the version each segment is derived using a different algorithm. They are 36 characters long and will therefore occupy more disk space than a big integer.

Unfortunately the UUID V4 version is not lexically ordered. Relational database engines will have a harder time inserting the indexes into storage and range queries will suffer from the locality problem. We will see what practically this may means for our queries.

UUID version 7

It follows the standard UUID format (RFC 1142). However the first part of the UUID is timestamp based and is lexically ordered. Since they are ordered means inserts are faster. Being a UUID they are 36 characters.

UUID V7 Base 32

We can use UUID V7 but encode it as Base32. This will reduce the size to 26 characters and we will lose the dashes. It remains lexically ordered. Base32 is more human readable.

UUID V7 Base 58

We can use UUID V7 but encode as Base58. This will reduce the size to 22 characters and we will lose the dashes. Base58 is used rather than 64 as there are 58 characters amongst the various UTF8 character sets which are stable. It remains lexically ordered. Base 58 is smaller is less human readable.

UUID V7 Binary Encoded

We can use UUID V7 and binary encode for 16 byte size but it is NOT longer lexially ordered.

ULID

Ulids were created before UUID version 7. UUID version 7 is a ULID but with version and variant fields. ULID is 26 characters and lexically ordered.

Setup

Create a table with a primary key, a random name and a timestamp for each type of primary key type.
Seed each table with 100 million rows.
Store a lookup key for every 1000 rows seeded in the database for each type. This will be used to store lookup values without fetching them from the target database.
Create a table to store the results of the tests.
Create a Laravel console command to run the tests and write the results to a results table.
Use only database calls i.e. no query builder.
Don’t start the timer in any code setup phase e.g. fetching lookup values. These won’t be timed to limit the time takes to database operations.

Tests

We will perform the following tests using MySQL running via Docker compose.

Recommended by LinkedIn

Building go-logql: A Type-Safe LogQL Query Builder for…

Apoorv Garg 3 months ago

.NET 10 Small new features

Mark Price 6 months ago

Memoization in React

Fidisys 2 years ago

services:
  db:
    image: mysql:latest
    ports:
      - 3306:3306
    environment:
      MYSQL_ROOT_PASSWORD: test_password
      MYSQL_DATABASE: test_primary_key
      MYSQL_USER: test_user
      MYSQL_PASSWORD: test_password
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 4000M
    volumes:
      - ./db_data:/var/lib/mysql

The computer is restarted after every test
The Symfony UUID component to generate the UUID/ULIDs.
There are 4 test types. Tests types are all run concurrently. Each test has a different number of concurrent instances
All in all 22 tests are run concurrently
The average for each is presented in the graph showing the results.

Insert

Run 10 instances concurrently. Each instance generates 10,000 insert row and then inserts them 1 at a time. The time taken is measured against the time to insert the records and does not include setup time.

InsertLarge

This test is the only test that runs by itself and not concurrently with the other tests. 20 instances of the test are run for 200,000 total records inserted. It demonstrates the effects of the index not naturally being sorted.

Range Select

A random offset is generated and 5000 records from the offset are selected. 4 instances are run concurrently. This should surface locality issues. The query is shown below.

SELECT * FROM {TABLE_NAME} 
WHERE order_date >= {LOOKUP_ORDER_DATE) 
ORDER BY order_date ASC
LIMIT 1000
OFFSET {RANDOM_OFFSET}

Lookup

Run 10 instances concurrently. Fetching 10,000 records from a random offset in the lookup database. Loop over each value in our lookup list and fetch it.

SELECT * FROM {TABLE_NAME} WHERE id = ?

Count

Run 4 instances concurrently. SELECT count(*) is a surprisingly slow operation. Each entry in the primary key index must be counted. We will see the effect the various index types have on count performance.

SELECT count(*) FROM {TABLE_NAME}

Results

It is clear the UUID V4 is much slower than the other UUID variants. It performed better than expected in inserts. If inserts are sustained over longer periods its performance drops off. It is also noted that seeding a 100 million rows with a UUID V4 primary key takes at least 6 times as long as the other variants due to the keys being unordered and B-Tree lookups taking much longer.
Binary encoded UUID V7 performs will. But since it is unordered it suffers in insert performance as searching the index will be less efficient as the time ordered varieties.
UUID V7 Base 32 and ULID variants appear to be the best compromise.

Conclusion

Auto incrementing IDs are fastest but there are good reasons to choose UUIDs now that time ordered variants are available.

Links

Source: https://github.com/andrew-belac/benchmark-primary-key Results: https://docs.google.com/spreadsheets/d/1wycaBzCwhDV_obmu2HZCvtt4qFj2GNYL75YoKUnZK6o/edit?usp=sharing

To view or add a comment, sign in

Primary Key Performance At Scale

Andrew Graaff

UUID version 4

UUID version 7

UUID V7 Base 32

UUID V7 Base 58

UUID V7 Binary Encoded

ULID

Setup

Tests

Recommended by LinkedIn

Lookup

Results

Conclusion

Links

More articles by Andrew Graaff

Others also viewed

WebReports: What are such weird things like "pre-defined sort parameters"?

PathVariable vs RequestParam in Spring Boot: When to use them

Journey from AI to LLMs and MCP - 6 - Enter the Model Context Protocol (MCP) — The Interoperability Layer for AI Agents

Code Smell 126 - Fake Null Object

Solving LeetCode's Problem of the Day: "Delete Leaves With a Given Value"

Streamlit - The fine touch of data science in the software empire

Spark with dependency injection

GraphQL - Dawn of New Era

How Crawl4AI Solved My Toughest RAG Data Ingestion Challenges

A journey to a movie recommender in U-SQL

Explore content categories

UUID version 4

UUID version 7

UUID V7 Base 32

UUID V7 Base 58

UUID V7 Binary Encoded

ULID

Setup

Tests

Recommended by LinkedIn

Lookup

Results

Conclusion

Links

More articles by Andrew Graaff

Benchmarking Protocol Buffers and JSON - Information for Architects

AWS : Imperatives for Your Business

Concurrency and Cloud Based Serverless Solutions

Fully Formed: Building Self Contained Software Assets

Others also viewed

WebReports: What are such weird things like "pre-defined sort parameters"?

PathVariable vs RequestParam in Spring Boot: When to use them

Journey from AI to LLMs and MCP - 6 - Enter the Model Context Protocol (MCP) — The Interoperability Layer for AI Agents

Code Smell 126 - Fake Null Object

Solving LeetCode's Problem of the Day: "Delete Leaves With a Given Value"

Streamlit - The fine touch of data science in the software empire

Spark with dependency injection

GraphQL - Dawn of New Era

How Crawl4AI Solved My Toughest RAG Data Ingestion Challenges

A journey to a movie recommender in U-SQL

Explore content categories