Primary Key Performance At Scale
In Domain Driven Design we need to create entities that are always in a consistent state. We therefore cannot use Auto Increment or Sequences for their primary key (id). We need a UUID or something which can be generated in process that provides guarantees of uniqueness. In this article we will look at the performance of a few options
One option is to use SnowFlake Ids. These were popularised by Twitter. Unfortunately they require a server to issue a partition ID or a machine ID to the producer. At most scales this does not seem acceptable.
We will therefore compare the following with auto incrementing big integers.
UUID version 4
UUIDs all have a format like ‘00000105–932c-4e83–98bd-fd3cbb92006a’. Depending on the version each segment is derived using a different algorithm. They are 36 characters long and will therefore occupy more disk space than a big integer.
Unfortunately the UUID V4 version is not lexically ordered
UUID version 7
It follows the standard UUID format (RFC 1142). However the first part of the UUID is timestamp based and is lexically ordered. Since they are ordered means inserts are faster. Being a UUID they are 36 characters.
UUID V7 Base 32
We can use UUID V7 but encode it as Base32. This will reduce the size to 26 characters and we will lose the dashes. It remains lexically ordered. Base32 is more human readable.
UUID V7 Base 58
We can use UUID V7 but encode as Base58. This will reduce the size to 22 characters and we will lose the dashes. Base58 is used rather than 64 as there are 58 characters amongst the various UTF8 character sets which are stable. It remains lexically ordered. Base 58 is smaller is less human readable.
UUID V7 Binary Encoded
We can use UUID V7 and binary encode for 16 byte size but it is NOT longer lexially ordered.
ULID
Ulids were created before UUID version 7. UUID version 7 is a ULID but with version and variant fields. ULID is 26 characters and lexically ordered.
Setup
Tests
We will perform the following tests using MySQL running via Docker compose.
Recommended by LinkedIn
services:
db:
image: mysql:latest
ports:
- 3306:3306
environment:
MYSQL_ROOT_PASSWORD: test_password
MYSQL_DATABASE: test_primary_key
MYSQL_USER: test_user
MYSQL_PASSWORD: test_password
deploy:
resources:
limits:
cpus: "2.0"
memory: 4000M
volumes:
- ./db_data:/var/lib/mysql
Insert
Run 10 instances concurrently. Each instance generates 10,000 insert row and then inserts them 1 at a time. The time taken is measured against the time to insert the records and does not include setup time.
InsertLarge
This test is the only test that runs by itself and not concurrently with the other tests. 20 instances of the test are run for 200,000 total records inserted. It demonstrates the effects of the index not naturally being sorted.
Range Select
A random offset is generated and 5000 records from the offset are selected. 4 instances are run concurrently. This should surface locality issues. The query is shown below.
SELECT * FROM {TABLE_NAME}
WHERE order_date >= {LOOKUP_ORDER_DATE)
ORDER BY order_date ASC
LIMIT 1000
OFFSET {RANDOM_OFFSET}
Lookup
Run 10 instances concurrently. Fetching 10,000 records from a random offset in the lookup database. Loop over each value in our lookup list and fetch it.
SELECT * FROM {TABLE_NAME} WHERE id = ?
Count
Run 4 instances concurrently. SELECT count(*) is a surprisingly slow operation. Each entry in the primary key index must be counted. We will see the effect the various index types have on count performance.
SELECT count(*) FROM {TABLE_NAME}
Results
Conclusion
Auto incrementing IDs are fastest but there are good reasons to choose UUIDs now that time ordered variants are available.