Performance benchmarking for Cloudflare D1 SQLite database

Performance benchmarking for Cloudflare D1 SQLite database

The main goal is comparing executing time for the users (1M records) and posts (10M records) tables with the id column as integer, UUID v4, and ULID.

Prehistory

As a frontend developer, I mostly worked with MongoDB in my pet projects. Currently, I'm working on a bigger product. That's a reason why I decided to rethink my decisions regarding chosen infrastructure, including database. I'm migrating to Cloudflare infra. Shortly, because it has friendly DX, transparent pricing, wider free packages, services I need, etc. So, the first step is migrating from usual MongoDB to Cloudflare D1 database based on SQLite. As a frontend developer, I have no idea how to write good SQL queries. During the investigation, I found posts about Prisma and how it's slow. That's funny. My search brought me to Drizzle . It has pretty nice documentations, it's free and open-sourced, it's faster than Prisma regarding provided benchmarks, I found many good feedbacks about it. When I started writing my first schemas, I started thinking about the architecture, security, and performance aspects on the beginning, just because I usually do that. And first point, that I'm using Nuxt application. I have a TypeScript interface provided by Drizzle from a table schema. This interface and data object type shared across server, API, and storefront components. It means, all the fields will be visible in the Network tab on request once it's fetched. I want to avoid cases when I get /api/users/1 explicit id field due to aesthetic and security reasons (yes, I know about session tokens). From the past while I was working with MongoDB, I remember that typically the _id field auto-generated by MongoDB uses UUID format. Not sure about the exact UUID version, but I'm sure it's not just an auto-incremented integer. I was just interested what if it can negatively affect a performance of the SQLite database.

Preparation for performance benchmarking

I'm using a Nuxt 3 application (with Nitro including), nitro-dev-cloudflare, wrangler, Miniflare for D1 local development, Drizzle ORM, Drizzle Kit, Drizzle Studio, and Drizzle Seed functionality. I decided to seed the database with 1 million users and 10 millions posts. But found out, that D1 has pretty limited request row. In my case, I could see only around 10-15 users and posts, then I just got error:

too many SQL variables at offset 425: SQLITE_ERROR        
Article content
too many SQL variables at offset 425: SQLITE_ERROR

So, instead of using the drizzle-seed package, I just manually seeded the tables in a for loop. Also, I had to increase RAM memory to 10 GB (default is 2 GB) for Node in my ~/.zshrc:

export NODE_OPTIONS="--max-old-space-size=10000"        
Article content
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory

Request example

It's pretty similar for all cases:

Article content

Performance benchmarking — Integer

Response time is ~480ms:

Article content

Performance benchmarking — UUID v4

Response time is ~1789ms:

Article content

Performance benchmarking — ULID

Response time is ~878ms.

In comparison with UUID v4 (36 chars), ULID has fewer chars (26 chars), but it's lexicographically sortable, time-based. I guessed it has to be better for performance and database size if I want to keep UUID flow for the id column.

Article content

Conclusions

  1. If you really care about performance and expect 1M+ records in tables on your product, I guess it'd be better to stay around auto-incremented integer. If not, it doesn't matter what you choose.
  2. If you want to have the UUID flow, pick ULID over UUID v4. Also, take a look at UUID v7. Feel free to use the repo below, to test other approaches, including uuidv7.
  3. drizzle-seed needs adjustments for D1 for seeding more than 20 records.
  4. Drizzle Studio is pretty fast, clear, and it has nice UX. Not a PhpMyAdmin if you are pretty old as me :D (I'm 29, LOL)

The best option

Andrii Sherman suggested to me the best approach. I can create a custom type via Drizzle ORM that returns integer in the same way for the id column. So, just go forward with auto-incremented integer primary key. But! It can be encoded only on select (no need for insert) to hide the exact value when returned from API.

Screenshot explanation:

Article content

Database example:

Article content

API response:

Article content
Get all users with their posts (limited)
Article content
Get a user by ID (already encoded) with their posts (limited)

So, it's equally fast as just using integer. In addition, I encoded it via the Hashids NPM package. It means, it could be decoded only when you know exact value of a decoding secret key. It solves what I meant. Furthermore, it seems much better in the API response, but still be the fastest option as needed due to the performance requirements.

There's a related commit.

There's a related GitHub repository:

Article content

Subscribe to my GitHub and LinkedIn profiles here to stay tuned for future investigations and insides about a product I'm working on.

Credits

Huge thanks to Andrii Sherman for time, assistance, useful tips, and for the detailed information that was pretty handy for this investigation.



To view or add a comment, sign in

More articles by Serhii Chernenko

Others also viewed

Explore content categories