PouchDB Mini-Dive: Performance Profiling - Part 1
Introduction
In some of my earlier articles on PouchDB and a handful of its plugins, I've been making the case that PouchDB can serve as a handy utility for quickly prototyping new applications. All this talk has eventually led me down a new line of questioning: once you've prototyped your application with it, just how performant is it as you scale up?
If PouchDB has the requisite performance chops, this could prove to be a potent combination: you could even publicly deploy your application prototype for alpha testing by real users without having to swap out PouchDB for some other solution! How awesome would that be?!
Today we'll be investigating insertion performance in various scenarios, and in a future article we'll look at query performance with the .find() API.
The Setup
2021 16" MacBook Pro - M1 Pro 16GB
MacOS Ventura 13.5.1 (22G90)
NodeJS v16.17.0
PouchDB@8.0.1 (Local DB w/ LevelDB Storage Engine)
Show Me The Data
To get a reference/baseline, I measured inserting various record quantities that look the following:
name = 'test'
link = 'http://www.example.com/example1?=example=example'
{
name,
link,
name2: name,
link2: link,
test1: { name, link },
test4: { name, link },
test5: { name, link },
test6: { name, link },
test7: { name, link }
}
I've duplicated name and link in order to artificially expand the size of the document to better simulate the character of real-world data.
Because this initial measurement is a baseline, we want to provide ideal initial conditions before measuring each bulk insertion attempt - a completely empty database.
Side Note: Inserting 100K records means issuing 100K distinct Promises at once and waiting for all of them to resolve in parallel with Promise.allSettled()- MDN.
Here are the measurements:
Recommended by LinkedIn
Inserting 1 document: 2.18 ms (milliseconds)
Inserting 3,333 documents: 268.50 ms (1/4 second)
Inserting 33,333 documents: 2132.47 ms (2 seconds)
Inserting 100,000 documents: 8361.61 ms (8 seconds)
That's not too shabby for a single NodeJS instance running on commodity hardware!
But again, these should be viewed as "best case" results. Before we dig deeper, I want to analyze the variability of insert times to ensure they're relatively consistent between runs.
I inserted 10K records 100 times (with an empty database before each insertion batch) and plotted a histogram of the execution times:
The average time to insert 10K records is ~585 ms. Almost all execution times are within 50ms of this average, so I'd conclude that the average serves as a fairly good estimator of the "typical" insertion time.
Now, let's pre-populate the database with 100K records initially before each batch of insertions to see how performance degrades compared to our baseline. Note that I won't be resetting the database back to 100K records before measuring the next batch of 10K insertions so that we can see if there's an overall increase in insertion time as the record count increases past 1 million rows (by the 90th test run).
These results are roughly comparable to the empty database test results with perhaps a slight right-skewing of the average insertion time. You'll notice that this histogram looks less like a normal distribution because each measurement is not truly independent of the others. After every 10K inserts, the database grows in size by the same number, which slows down each subsequent query ever so slightly.
The average insertion time of the last 10 batches was ~608 ms (when the database is occupied with >1M rows), while the average of the first 10 batches (when the database only contains ~150K rows) was ~584 ms. This indicates a slight increase, on average, of about 25 ms. In my opinion, that's a pretty small price to pay considering the database had grown 10x in size!
Now, let's compare average insertion times at different initial record counts to get an even clearer picture of the throughput drag introduced by an ever-growing database index:
There's obviously some variance not fully captured in this image since averages depicted reflect an increase in database size after each new insertion, but I think this makes the numbers more reliable since they encode some of the "drift" that would occur in the real world as new records are added. The good news is that there seems to only be about a 100ms penalty for insertion when the database already contains 10M rows vs when it's completely empty.
Obviously, you'll generally want to stress-test your own use cases before deploying an alpha version of your application prototype, but the initial measurements I've collected here using synthetic data are quite encouraging!
Get excited for Part 2 where we'll be extending our performance profile analysis to lookup queries!