PouchDB Mini-Dive: Performance Profiling - Part 1

Dillon Notz

Published Sep 22, 2023

Introduction

In some of my earlier articles on PouchDB and a handful of its plugins, I've been making the case that PouchDB can serve as a handy utility for quickly prototyping new applications. All this talk has eventually led me down a new line of questioning: once you've prototyped your application with it, just how performant is it as you scale up?

If PouchDB has the requisite performance chops, this could prove to be a potent combination: you could even publicly deploy your application prototype for alpha testing by real users without having to swap out PouchDB for some other solution! How awesome would that be?!

Today we'll be investigating insertion performance in various scenarios, and in a future article we'll look at query performance with the .find() API.

The Setup

2021 16" MacBook Pro - M1 Pro 16GB

MacOS Ventura 13.5.1 (22G90)

NodeJS v16.17.0

PouchDB@8.0.1 (Local DB w/ LevelDB Storage Engine)

Show Me The Data

To get a reference/baseline, I measured inserting various record quantities that look the following:

name = 'test'
link =  'http://www.example.com/example1?=example=example'

{ 
  name, 
  link,
  name2: name,
  link2: link,
  test1: { name, link }, 
  test4: { name, link },
  test5: { name, link },
  test6: { name, link },
  test7: { name, link }
}

I've duplicated name and link in order to artificially expand the size of the document to better simulate the character of real-world data.

Because this initial measurement is a baseline, we want to provide ideal initial conditions before measuring each bulk insertion attempt - a completely empty database.

Side Note: Inserting 100K records means issuing 100K distinct Promises at once and waiting for all of them to resolve in parallel with Promise.allSettled()- MDN.

Here are the measurements:

Recommended by LinkedIn

🔥 Supercharge Your Go App Performance with pprof and…

Shaik Sanwar Ali 2 years ago

Context Windows Are the New Memory Management: Why…

David Matousek 2 months ago

How to simulate a p2p transfer from one TTK based DFSP…

Elijah Okello 2 years ago

Inserting 1 document: 2.18 ms (milliseconds)
Inserting 3,333 documents: 268.50 ms (1/4 second)
Inserting 33,333 documents: 2132.47 ms (2 seconds)
Inserting 100,000 documents: 8361.61 ms (8 seconds)

That's not too shabby for a single NodeJS instance running on commodity hardware!

But again, these should be viewed as "best case" results. Before we dig deeper, I want to analyze the variability of insert times to ensure they're relatively consistent between runs.

I inserted 10K records 100 times (with an empty database before each insertion batch) and plotted a histogram of the execution times:

Article content — Histogram of time to parallel insert 10K records (with 10K Promises - one per record) in a single NodeJS instance's local PouchDB (storage engine: LevelDB). There are 100 samples present.

The average time to insert 10K records is ~585 ms. Almost all execution times are within 50ms of this average, so I'd conclude that the average serves as a fairly good estimator of the "typical" insertion time.

Now, let's pre-populate the database with 100K records initially before each batch of insertions to see how performance degrades compared to our baseline. Note that I won't be resetting the database back to 100K records before measuring the next batch of 10K insertions so that we can see if there's an overall increase in insertion time as the record count increases past 1 million rows (by the 90th test run).

These results are roughly comparable to the empty database test results with perhaps a slight right-skewing of the average insertion time. You'll notice that this histogram looks less like a normal distribution because each measurement is not truly independent of the others. After every 10K inserts, the database grows in size by the same number, which slows down each subsequent query ever so slightly.

The average insertion time of the last 10 batches was ~608 ms (when the database is occupied with >1M rows), while the average of the first 10 batches (when the database only contains ~150K rows) was ~584 ms. This indicates a slight increase, on average, of about 25 ms. In my opinion, that's a pretty small price to pay considering the database had grown 10x in size!

Now, let's compare average insertion times at different initial record counts to get an even clearer picture of the throughput drag introduced by an ever-growing database index:

There's obviously some variance not fully captured in this image since averages depicted reflect an increase in database size after each new insertion, but I think this makes the numbers more reliable since they encode some of the "drift" that would occur in the real world as new records are added. The good news is that there seems to only be about a 100ms penalty for insertion when the database already contains 10M rows vs when it's completely empty.

Obviously, you'll generally want to stress-test your own use cases before deploying an alpha version of your application prototype, but the initial measurements I've collected here using synthetic data are quite encouraging!

Get excited for Part 2 where we'll be extending our performance profile analysis to lookup queries!

To view or add a comment, sign in

PouchDB Mini-Dive: Performance Profiling - Part 1

Dillon Notz

Introduction

The Setup

Show Me The Data

Recommended by LinkedIn

More articles by Dillon Notz

Others also viewed

How we sped up Android and iOS builds by 68% on CI

IBM i - Getting a secure TN5250 emulator with Chrome Browser

Application Performance Begins At BIOS

Switching apps behind a cloud(1)

Chrome Silently Broke My Local Network — And Hundreds of Others' Too

tmux (Terminal Multiplexer).

How Your Computer Handles Multiple Tasks: A Simple Guide to Threads and Goroutines

Cooking in Codex, Two Ways: MacOS Desktop vs the Cursor Extension

Getting SysInternals Procdump to work with multiple processes

Where software issues come from and how to get around

Explore content categories

Introduction

The Setup

Show Me The Data

Recommended by LinkedIn

More articles by Dillon Notz

PouchDB Quick-Dive: Relational Data

PouchDB Quick-Dive: find() Query API

PouchDB Quick-Dive: Intro

Others also viewed

How we sped up Android and iOS builds by 68% on CI

IBM i - Getting a secure TN5250 emulator with Chrome Browser

Application Performance Begins At BIOS

Switching apps behind a cloud(1)

Chrome Silently Broke My Local Network — And Hundreds of Others' Too

tmux (Terminal Multiplexer).

How Your Computer Handles Multiple Tasks: A Simple Guide to Threads and Goroutines

Cooking in Codex, Two Ways: MacOS Desktop vs the Cursor Extension

Getting SysInternals Procdump to work with multiple processes

Where software issues come from and how to get around

Explore content categories