Why Binary Document Protocols Are Not All Created Equal

Why Binary Document Protocols Are Not All Created Equal

BSON vs OSON: A Deep Dive into Algorithmic Complexity

How I Proved O(n) vs O(1) Actually Matters — With Numbers

I've been talking about the algorithmic difference between MongoDB's BSON and Oracle's OSON for a while now. The theory is clean: BSON scans field names sequentially at each document level. OSON hashes and jumps directly to offsets.

O(n) vs O(1).

But theory without proof is just opinion. So I built a test framework to prove it. The results were more dramatic than I expected.

═══════════════════════════════════════

𝗧𝘄𝗼 𝗙𝗼𝗿𝗺𝗮𝘁𝘀, 𝗧𝘄𝗼 𝗘𝗿𝗮𝘀, 𝗧𝘄𝗼 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗖𝘂𝗹𝘁𝘂𝗿𝗲𝘀

Before we dive into the numbers, it's worth understanding how these formats came to be. The difference in performance isn't accidental — it reflects the circumstances of their creation.

𝗕𝗦𝗢𝗡: 𝗧𝗵𝗲 𝗦𝘁𝗮𝗿𝘁𝘂𝗽 𝗦𝗽𝗿𝗶𝗻𝘁

BSON was born in 2007-2009 at 10gen (now MongoDB Inc.), a New York startup founded by Dwight Merriman and Eliot Horowitz. Both came from DoubleClick — Merriman as founder and CTO, Horowitz as an engineer who'd been coding since age five and had a CS degree from Brown.

They weren't trying to build a database. They were building a platform-as-a-service. The database layer was infrastructure — a means to an end. When users kept telling them "we really like this database thing," they pivoted. MongoDB went from internal component to open-source product in roughly 18 months.

BSON was designed for that moment: schema-less flexibility, quick iteration, "good enough" performance for web-scale document storage. The design goal was traversable data — length prefixes that let you skip sub-documents you don't need. It worked. MongoDB took off. But "traversable" isn't the same as "indexed." Within each document level, BSON still scans field names sequentially. That decision — made under startup pressure to ship something that worked — became baked into BSON DNA.

𝗢𝗦𝗢𝗡: 𝗧𝗵𝗲 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗩𝗲𝘁𝗲𝗿𝗮𝗻𝘀

OSON arrived eight years later, designed by a very different team with very different constraints. The lead architect was Zhen Hua Liu — a database internals veteran with close to 100 published research papers, the originator of the SQL/JSON standard, and Oracle's principal architect for both SQL/JSON and SQL/XML. He'd spent decades thinking about how to efficiently query hierarchical data structures inside a relational engine. The OSON team included distinguished engineers who'd published at VLDB (the premier database research conference), worked on Oracle's query optimizer, and understood at a deep level how binary formats interact with memory hierarchies, CPU caches, and execution plans.

They also had something the 10gen team didn't: 𝗵𝗶𝗻𝗱𝘀𝗶𝗴𝗵𝘁.

By 2017, BSON had been in production for eight years. Its limitations were well-documented. The Oracle team could see exactly where sequential field scanning became a bottleneck — and design around it. The result was hash-indexed navigation. Instead of scanning field names, OSON computes a hash and jumps directly to an offset. O(1) instead of O(n). The format was built into Oracle Database 12.2 in March 2017, refined through 19c, 21c, and 23ai, and open-sourced in 2023.

𝗪𝗵𝘆 𝗧𝗵𝗶𝘀 𝗠𝗮𝘁𝘁𝗲𝗿𝘀

I'm not telling this story to disparage the MongoDB founders. They built something remarkable under real constraints. BSON was a reasonable design for 2009. But it's 2025 now. The workloads have changed. Documents are larger, nesting is deeper, access patterns are more complex. And we now have a format designed by database research veterans who had the luxury of learning from BSON's limitations.

The results that follow aren't surprising when you understand the context. They're the predictable outcome of two very different engineering approaches.

═══════════════════════════════════════

𝗧𝗵𝗲 𝗣𝗿𝗼𝗯𝗹𝗲𝗺 𝗪𝗶𝘁𝗵 𝗘𝘅𝗶𝘀𝘁𝗶𝗻𝗴 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀

Standard database benchmarks — YCSB, sysbench, TPC — measure end-to-end latency. They'll tell you how long a query takes, but not why. When you're comparing document databases, this matters. A lot.

Because if network latency is 10ms and your document traversal takes 30 microseconds, the traversal cost disappears in the noise. Everything looks roughly the same.

But what about the client side? Every time your application accesses a field in that document, it's paying the traversal cost again. And again. And again.

That's when algorithmic complexity stops being theoretical and starts hitting your wallet. I needed a test that could isolate the traversal cost — eliminate network overhead entirely and measure pure field access performance.

So I built one.

═══════════════════════════════════════

𝗗𝗼𝗰𝗕𝗲𝗻𝗰𝗵: 𝗖𝗹𝗶𝗲𝗻𝘁-𝗦𝗶𝗱𝗲 𝗧𝗿𝗮𝘃𝗲𝗿𝘀𝗮𝗹 Analysis

DocBench is an open-source Java framework designed to answer one question: How fast can you access a field in a binary JSON document?

The methodology is simple:

1. Insert a test document

2. Fetch the full document once

3. Perform 100,000 client-side field accesses

4. Measure nanosecond-level timing for each access

5. Average across iterations

No network in the measurement loop. No server overhead. Just pure client-side parsing and field navigation. This isolates exactly what we want to measure: the algorithmic cost of finding a field in a binary JSON structure.

═══════════════════════════════════════

𝗧𝗵𝗲 𝗧𝗲𝘀𝘁: 𝗙𝗶𝗲𝗹𝗱 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻 𝗦𝗰𝗮𝗹𝗶𝗻𝗴

The simplest way to prove algorithmic complexity is to vary the input and measure how time scales.

For O(n) — time should increase linearly with position.

For O(1) — time should stay constant regardless of position.

Test setup:

→ Documents with 100, 500, and 1000 fields

→ Target field placed at positions 1, 50, 100, 500, 1000

→ 100,000 iterations per test (to eliminate noise)

→ Nanosecond-precision timing

→ Random test ordering (to eliminate cache effects)

The question: How long does it take to access a single field at different positions?

═══════════════════════════════════════

𝗧𝗵𝗲 𝗥𝗲𝘀𝘂𝗹𝘁𝘀

Article content

At position 1000, OSON is 𝟱𝟮𝟵 𝘁𝗶𝗺𝗲𝘀 𝗳𝗮𝘀𝘁𝗲𝗿 than BSON.

Not 5x. Not 50x. 𝟱𝟮𝟵𝘅.

═══════════════════════════════════════

𝗪𝗵𝘆 𝗘𝘃𝗲𝗻 𝗣𝗼𝘀𝗶𝘁𝗶𝗼𝗻 𝟭 𝗜𝘀 𝗦𝗹𝗼𝘄𝗲𝗿

Look at the first row of results: Position 1/100.

BSON: 250 ns. OSON: 99 ns. 2.5x difference.

Wait — if BSON only scanned one field, shouldn't it be roughly equal?

The answer is no, because the operations themselves aren't equal.

𝗕𝗦𝗢𝗡 𝗳𝗶𝗲𝗹𝗱 𝗮𝗰𝗰𝗲𝘀𝘀: String equality comparison

→ Read field name bytes from buffer

→ Compare byte-by-byte against target string

→ On match, return value

𝗢𝗦𝗢𝗡 𝗳𝗶𝗲𝗹𝗱 𝗮𝗰𝗰𝗲𝘀𝘀: Hash lookup

→ Compute hash of field name (often cached)

→ Index into offset dictionary

→ Jump directly to value

String comparison is inherently more expensive than hash lookup, even for a single field. And it gets worse...in BSON, every field you pass requires a full string comparison to determine "is this the one?" In OSON, you compute one hash and jump — regardless of how many fields exist.

This means BSON's traversal cost compounds in two dimensions:

→ 𝗕𝗿𝗲𝗮𝗱𝘁𝗵: More fields at each level = more string comparisons to skip

→ 𝗗𝗲𝗽𝘁𝗵: More nesting levels = more traversal operations

OSON's cost stays flat in both dimensions. Hash + jump. Hash + jump. Done.

═══════════════════════════════════════

𝗣𝗿𝗼𝘃𝗶𝗻𝗴 𝗢(𝗻) 𝗕𝗲𝗵𝗮𝘃𝗶𝗼𝗿

Look at how BSON scales:

→ Position 1: 250 ns

→ Position 1000: 31,195 ns

That's a 𝟭𝟮𝟱𝘅 𝗶𝗻𝗰𝗿𝗲𝗮𝘀𝗲 in time for a 1000x increase in position.

Linear scaling. O(n) confirmed.

Now look at OSON:

→ Position 1: 99 ns

→ Position 1000: 59 ns

Time actually decreased slightly (within noise margin). Position is irrelevant.

Constant time. O(1) confirmed.

This isn't marketing. This is algorithmic complexity playing out exactly as computer science predicts.

═══════════════════════════════════════

𝗡𝗲𝘀𝘁𝗲𝗱 𝗣𝗮𝘁𝗵 𝗗𝗲𝗽𝘁𝗵: 𝗕𝗿𝗲𝗮𝗱𝘁𝗵 × 𝗗𝗲𝗽𝘁𝗵

Field position isn't the only variable. Nesting depth matters too. Real documents aren't flat. They have structure: order.items[5].product.sku

Every level of nesting is another traversal operation. For BSON, that's another round of sequential string comparisons across all fields at that level. For OSON, it's another hash + jump.

Article content

The gap widens with depth. But here's the key insight: 𝗯𝗿𝗲𝗮𝗱𝘁𝗵 𝗮𝗻𝗱 𝗱𝗲𝗽𝘁𝗵 𝗺𝘂𝗹𝘁𝗶𝗽𝗹𝘆.

Consider a path like order.items[5].product.sku in a realistic document:

→ Level 1: Skip past _id, created, status, customer... → string comparisons

→ Level 2: Skip past lineNumber, quantity, price... → more string comparisons

→ Level 3: Skip past name, category, weight... → even more string comparisons

→ Level 4: Finally find sku

Each level has its own breadth penalty. BSON pays the string comparison tax at every level, for every field it passes.

OSON? Four hash lookups. Total. Regardless of how many fields exist at each level.

═══════════════════════════════════════

𝗪𝗵𝘆 𝗖𝗹𝗶𝗲𝗻𝘁-𝗦𝗶𝗱𝗲 𝗠𝗮𝘁𝘁𝗲𝗿𝘀

Here's what most database comparisons miss: your application accesses document fields far more often than it fetches documents.

Think about it:

→ You query the database once

→ You get back a document

→ Your application code accesses 5, 10, maybe 20 fields from that document

→ Each access is a traversal operation

With BSON, every document.get("fieldName") is an O(n) scan through all preceding fields.

With OSON, every document.get("fieldName") is an O(1) hash lookup.

If your application touches 10 fields per document and processes 10,000 documents per second, that's 100,000 field accesses per second. The difference between 30 microseconds and 60 nanoseconds per access?

That's 3 seconds vs 6 milliseconds. Every second.

═══════════════════════════════════════

𝗧𝗵𝗲 𝗥𝗮𝘄𝗕𝘀𝗼𝗻𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗥𝗲𝗮𝗹𝗶𝘁𝘆

"Wait," you might say. "I use MongoDB's Document class and field access feels fast."

You're right — but here's what's happening under the hood.

MongoDB's Document class wraps a LinkedHashMap<String, Object>. When your driver receives BSON from the server, it fully parses the entire binary structure and populates that HashMap. Every field, every nested object, every array element — all converted to Java objects and inserted into the map. After that parse completes, yes, .get("fieldName") is O(1). It's a Java HashMap lookup. But you've already paid the cost. The O(n) traversal happened during deserialization — you just didn't see it as a separate operation.

𝗧𝗵𝗲 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳 𝗹𝗼𝗼𝗸𝘀 𝗹𝗶𝗸𝗲 𝘁𝗵𝗶𝘀:

Article content

I measured exactly this trade-off. Here's what deserialization actually costs:

Deserialization Cost (one-time overhead):

Article content

That's the upfront cost you're paying every time you deserialize to a Document or BsonDocument. But what do you get in return?

Single Field Access (per access):

Article content

After deserialization, BsonDocument and Document achieve O(1) access (~20-30 ns) via HashMap lookup. But look at RawBsonDocument — it's paying 155 ns for position 1, and a staggering 25,871 ns for position 1000. That's the O(n) sequential scan in action.

The break-even point: How many accesses justify deserialization?

Article content

If you're accessing fewer than 10-25 fields from a document, RawBsonDocument wins — you skip the deserialization overhead. Access more fields than that, and pre-deserializing to Document becomes worthwhile.

But look at the Oracle OSON column in those tables. OSON wins every single scenario. At 100 different field accesses on a 100-field document: Oracle takes 4,759 ns. Even MongoDB's optimized Document class (with its ~13K ns deserialization overhead already paid) takes 13,346 ns. That's 2.8x faster — and OSON didn't need to deserialize anything.

Article content

𝗢𝗦𝗢𝗡 𝗱𝗼𝗲𝘀𝗻'𝘁 𝗵𝗮𝘃𝗲 𝘁𝗵𝗶𝘀 𝘁𝗿𝗮𝗱𝗲-𝗼𝗳𝗳.

Oracle's OracleJsonObject gives you O(1) field access whether you're working with the raw binary or a parsed representation. The hash index is baked into the format itself. There's no "parse everything first" vs "scan on every access" decision to make.

This is why the DocBench tests use RawBsonDocument — it isolates the true binary format performance. Using Document would measure Java's HashMap after the hidden parse, which isn't a fair comparison of the underlying formats. But as this deserialization analysis show, even MongoDB's optimized path can't escape the fundamental O(n) cost — it just moves it to a different place.

═══════════════════════════════════════

𝗡𝗼𝘄 𝗟𝗲𝘁'𝘀 𝗟𝗼𝗼𝗸 𝗮𝘁 𝗨𝗽𝗱𝗮𝘁𝗲𝘀

Read performance tells half the story. What happens when you need to modify a document?

Both BSON and OSON require a full cycle for updates: decode → modify → encode. Neither format supports true in-place modification of arbitrary fields. You have to deserialize, change the value, and re-serialize.

I added 17 update scenarios to DocBench to measure this cycle. The results were interesting — and different from reads.

𝗦𝗶𝗺𝗽𝗹𝗲 𝗙𝗶𝗲𝗹𝗱 𝗨𝗽𝗱𝗮𝘁𝗲𝘀 (the cost of BSON offset recalculation):

Article content

Notice something counterintuitive? For reads, position 1 is fastest (less scanning). For updates, position 1 is 𝘀𝗹𝗼𝘄𝗲𝘀𝘁.

This is offset recalculation in action. BSON uses length prefixes — when you modify a field, every subsequent field's offset may need adjustment:

→ Update at position 1 → recalculate offsets for ~99 subsequent fields

→ Update at position 50 → recalculate offsets for ~50 subsequent fields

→ Update at position 100 → recalculate offsets for 0 subsequent fields

OSON stays constant (~10.2-10.4K ns) because hash-indexed offsets are computed, not stored sequentially. No cascading recalculation.

𝗡𝗲𝘀𝘁𝗲𝗱 𝗨𝗽𝗱𝗮𝘁𝗲𝘀 (OSON's hash/jump wins at depth):

Article content

At shallow depths, the formats are roughly equal. But at depth 3+, OSON's O(1) navigation advantage emerges — 1.38-1.40x faster for deeply nested updates.

𝗙𝗶𝗲𝗹𝗱 𝗜𝗻𝘀𝗲𝗿𝘁𝗶𝗼𝗻 (OSON efficiency wins again):

Article content

Insertion position doesn't matter — both formats rebuild the entire document.

𝗔𝗿𝗿𝗮𝘆 𝗚𝗿𝗼𝘄𝘁𝗵 (where BSON holds its own):

Article content

BSON's BsonArray.add() is slightly more efficient than OSON's copy-and-grow approach.

𝗞𝗲𝘆 𝗶𝗻𝘀𝗶𝗴𝗵𝘁𝘀 𝗳𝗿𝗼𝗺 𝘂𝗽𝗱𝗮𝘁𝗲 analysis:

→ 𝗢𝗳𝗳𝘀𝗲𝘁 𝗿𝗲𝗰𝗮𝗹𝗰𝘂𝗹𝗮𝘁𝗶𝗼𝗻 𝗽𝗲𝗻𝗮𝗹𝘁𝘆: BSON updates at the beginning of a document are slowest (1.30x OSON advantage at position 1 vs 1.10x at position 100) due to cascading length prefix recalculation. OSON's hash-indexed offsets don't cascade.

→ 𝗡𝗲𝘀𝘁𝗲𝗱 𝘂𝗽𝗱𝗮𝘁𝗲𝘀 𝘀𝗵𝗼𝘄 𝗢(𝟭) 𝗮𝗱𝘃𝗮𝗻𝘁𝗮𝗴𝗲: At depth 3+, OSON is 1.38-1.40x faster. If your update patterns involve deeply nested paths, the navigation difference matters.

→ 𝗔𝗿𝗿𝗮𝘆 𝗼𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝘀 𝘀𝗹𝗶𝗴𝗵𝘁𝗹𝘆 𝗳𝗮𝘃𝗼𝗿 𝗕𝗦𝗢𝗡: For append-heavy array workloads, BSON's BsonArray.add() edges out OSON's copy-and-grow approach.

═══════════════════════════════════════

𝗦𝗲𝗿𝘃𝗲𝗿-𝗦𝗶𝗱𝗲 𝗨𝗽𝗱𝗮𝘁𝗲𝘀: 𝗪𝗵𝗲𝗿𝗲 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗦𝗶𝘇𝗲 𝗗𝗲𝘀𝘁𝗿𝗼𝘆𝘀 𝗠𝗼𝗻𝗴𝗼𝗗𝗕

Client-side measurements isolate format efficiency. But what happens when you include the full round-trip — network, server processing, durability guarantees?

I ran 25 server-side update tests with proper durability parity. This matters more than most performance tests acknowledge.

𝗔 𝗻𝗼𝘁𝗲 𝗼𝗻 𝗱𝘂𝗿𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝘀𝗲𝘁𝘁𝗶𝗻𝗴𝘀 (because fair comparisons require equal guarantees):

MongoDB updates using WriteConcern {w:1} without explicit j:true does not wait for journal sync. WiredTiger only syncs journal records to disk at 100ms intervals or when creating new journal files (~100MB). MongoDB's own documentation warns that "updates can be lost following a hard shutdown" during this window.

Oracle, by contrast, automatically waits for redo log sync before acknowledging every COMMIT — the log file sync wait event is mandatory. There's no "fast but maybe lose your data" mode.

For a fair comparison, I configured MongoDB as a single-node replica set (https://www.mongodb.com/docs/manual/tutorial/deploy-replica-set-for-testing/) — MongoDB's own recommended configuration for development and testing — with WriteConcern {w:1, j:true}:

→ 𝘄:𝟭 — Write acknowledged by primary (avoids replication overhead in single-node test)

→ 𝗷:𝘁𝗿𝘂𝗲 — Write must sync to on-disk journal before acknowledgment

Why a replica set instead of standalone? Because standalone MongoDB doesn't support transactions or change streams. A single-node replica set enables full feature parity while isolating single-node write performance. This is exactly what MongoDB recommends for testing environments.

This configuration ensures MongoDB waits for journal sync, matching Oracle's behavior. Same durability guarantees, same feature set, fair fight.

𝗧𝗵𝗲 𝗿𝗲𝘀𝘂𝗹𝘁𝘀 𝗲𝘅𝗽𝗼𝘀𝗲𝗱 𝘀𝗼𝗺𝗲𝘁𝗵𝗶𝗻𝗴 𝘁𝗵𝗲 𝗰𝗹𝗶𝗲𝗻𝘁-𝘀𝗶𝗱𝗲 𝘁𝗲𝘀𝘁𝘀 𝗼𝗻𝗹𝘆 𝗵𝗶𝗻𝘁𝗲𝗱 𝗮𝘁: 𝗠𝗼𝗻𝗴𝗼𝗗𝗕'𝘀 𝘂𝗽𝗱𝗮𝘁𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗱𝗲𝗴𝗿𝗮𝗱𝗲𝘀 𝘄𝗶𝘁𝗵 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝘀𝗶𝘇𝗲. 𝗢𝗦𝗢𝗡'𝘀 𝗱𝗼𝗲𝘀𝗻'𝘁.

𝗟𝗮𝗿𝗴𝗲 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗦𝗶𝗻𝗴𝗹𝗲-𝗙𝗶𝗲𝗹𝗱 𝗨𝗽𝗱𝗮𝘁𝗲𝘀:

Article content

Read that last row. Updating a single field in a 4MB document:

→ 𝗠𝗼𝗻𝗴𝗼𝗗𝗕: 7.4 milliseconds

→ 𝗢𝗦𝗢𝗡: 1.1 milliseconds

MongoDB's time 𝗶𝗻𝗰𝗿𝗲𝗮𝘀𝗲𝗱 𝟯.𝟳𝘅 going from 10KB to 4MB. OSON's time barely moved.

This isn't about field access anymore. This is about what happens when you have to re-serialize and re-write a large document. MongoDB's architecture pays a tax proportional to document size. OSON's doesn't.

𝗔𝗿𝗿𝗮𝘆 𝗢𝗽𝗲𝗿𝗮𝘁𝗶𝗼𝗻𝘀 (Delete from Middle):

Article content

Deleting from the middle of an array requires shifting subsequent elements. MongoDB's sequential layout means more work. OSON handles it with less overhead.

𝗪𝗵𝗲𝗿𝗲 𝗠𝗼𝗻𝗴𝗼𝗗𝗕 𝗪𝗶𝗻𝘀 (Large Homogeneous Arrays):

Article content

Credit where due: MongoDB has optimizations for large homogeneous arrays. If your workload is append-only to massive scalar arrays, MongoDB has an edge.

But ask yourself: how often is that your access pattern?

𝗢𝘃𝗲𝗿𝗮𝗹𝗹 𝗦𝗲𝗿𝘃𝗲𝗿-𝗦𝗶𝗱𝗲 𝗨𝗽𝗱𝗮𝘁𝗲 𝗥𝗲𝘀𝘂𝗹𝘁𝘀:

Article content

The pattern is clear: 𝗢𝗦𝗢𝗡 𝘄𝗶𝗻𝘀 𝗼𝗻 𝗿𝗲𝗮𝗹-𝘄𝗼𝗿𝗹𝗱 𝘂𝗽𝗱𝗮𝘁𝗲 𝗽𝗮𝘁𝘁𝗲𝗿𝗻𝘀. 𝗠𝗼𝗻𝗴𝗼𝗗𝗕 𝘄𝗶𝗻𝘀 𝗼𝗻 𝗲𝗱𝗴𝗲 𝗰𝗮𝘀𝗲𝘀.

═══════════════════════════════════════

𝗕𝘂𝘁 𝗪𝗵𝗮𝘁 𝗔𝗯𝗼𝘂𝘁 𝗞𝗲𝘆 𝗢𝗿𝗱𝗲𝗿?

A fair question. BSON preserves field order. OSON, by design, does not — because the JSON data model doesn't define key order for objects.

Does this really matter? In practice, rarely. In fact, RFC 8259 (https://datatracker.ietf.org/doc/html/rfc8259) — the official JSON specification — explicitly defines an object as "an unordered collection of zero or more name/value pairs." The spec further advises that "implementations whose behavior does not depend on member ordering will be interoperable." If consuming applications depend on order, they should be using arrays or parsing by key name, not position.

Also, BSON's guarantee is weaker than it appears. MongoDB doesn't always guarantee field order is preserved after updates. If you modify a document in certain ways, fields may be reordered. So even with BSON, key order isn't something applications can reliably depend on.

The bottom line: if your application genuinely requires deterministic field ordering, BSON might support it depending on what you are doing with your data. However, in 30 years of enterprise data work, I've yet to encounter a production use case where field order in a JSON document was a hard requirement so I tend to agree with the JSON spec on this. If you have a good reason, I'd love to hear about it.

═══════════════════════════════════════

𝗪𝗵𝗲𝗻 𝗗𝗼𝗲𝘀 𝗧𝗵𝗶𝘀 𝗠𝗮𝘁𝘁𝗲𝗿?

Not every workload hits these bottlenecks. If you're doing simple key-value lookups on small documents, both formats perform well. But the difference compounds in specific scenarios:

𝗛𝗶𝗴𝗵-𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁 𝗔𝗣𝗜𝘀: Thousands of requests per second, each touching multiple fields. O(n) adds up.

𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗱𝗼𝗰𝘂𝗺𝗲𝗻𝘁𝘀: E-commerce orders, IoT telemetry, user activity logs. Real documents have 50-200 fields. Position matters.

𝗗𝗲𝗲𝗽𝗹𝘆 𝗻𝗲𝘀𝘁𝗲𝗱 𝗮𝗰𝗰𝗲𝘀𝘀: Modern document schemas nest 4-6 levels deep. Every level is a traversal.

𝗛𝗼𝘁 𝗽𝗮𝘁𝗵𝘀 𝗶𝗻 𝗮𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 𝗰𝗼𝗱𝗲: That one function that processes every request and pulls 5 fields from a document? It's paying the traversal tax on every call.

If any of these describe your workload, you're paying the BSON traversal tax.

═══════════════════════════════════════

𝗧𝗵𝗲 𝗙𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸

DocBench is MIT-licensed and available on GitHub: github.com/rhoulihan/DocBench

𝗪𝗵𝗮𝘁'𝘀 𝗶𝗻𝗰𝗹𝘂𝗱𝗲𝗱:

→ Client-side field access analysis (network overhead eliminated)

→ Update cycle measurement (decode → modify → encode)

→ Server-side update tests with durability parity (document size scaling, array operations)

→ BSON RawBsonDocument vs OSON OracleJsonObject comparison

→ Position-based, depth-based, and update test suites

→ Nanosecond-precision timing with 10K-100K iterations

→ Seeded random generation for reproducible results

𝗥𝗲𝗾𝘂𝗶𝗿𝗲𝗺𝗲𝗻𝘁𝘀:

→ Java 21+

→ Docker (for TestContainers)

→ MongoDB 7.0+ and/or Oracle 26ai

Run the comparison yourself:

./gradlew integrationTest --tests "*.BsonVsOsonClientSideTest"

The numbers don't lie.

═══════════════════════════════════════

𝗪𝗵𝗮𝘁 𝗧𝗵𝗶𝘀 𝗠𝗲𝗮𝗻𝘀

Binary JSON formats aren't interchangeable. The internal structure determines access performance.

𝗕𝗦𝗢𝗡 is optimized for write-heavy workloads with simple, linear access patterns. It's efficient for what MongoDB was originally designed for: append-heavy document stores where you read the whole document anyway.

𝗢𝗦𝗢𝗡 is optimized for hybrid workloads with complex access patterns. It's designed for the reality of modern applications: mixed patterns, deep nesting, partial document access, and high-throughput field extraction.

The trade-off isn't theoretical anymore. It's measured.

𝗔𝗻𝗱 𝗵𝗲𝗿𝗲'𝘀 𝘁𝗵𝗲 𝗸𝗶𝗰𝗸𝗲𝗿: Because OSON makes more efficient use of compute resources, Oracle can offer Autonomous JSON Database at a lower price point than MongoDB Atlas — and still outperform it. You're not paying a premium for better performance. You're paying less and getting more.

Icing on the cake? ADB includes 3X on-demand autoscaling — when your workload spikes, compute scales automatically and you only pay for what you use, billed per-second. Efficient format + elastic infrastructure = cost optimization MongoDB can't match.

That's what happens when you design a binary format with decades of database engineering experience instead of startup pressure to ship.

═══════════════════════════════════════

𝗧𝗵𝗲 𝗧𝗮𝗸𝗲𝗮𝘄𝗮𝘆

"Traversable" sounds fast. "Indexed" actually is.

Both formats avoid parsing data you don't need — that's table stakes. But what happens when you do need a field buried in a complex document? Or update a field in a large document?

BSON scans. OSON jumps.

𝗧𝗵𝗲 𝗳𝗶𝗻𝗮𝗹 𝘀𝗰𝗼𝗿𝗲𝗰𝗮𝗿𝗱:

→ Client-side field access: 𝗢𝗦𝗢𝗡 𝟳𝟭𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 𝗼𝘃𝗲𝗿𝗮𝗹𝗹

→ Server-side updates: 𝗢𝗦𝗢𝗡 𝟭.𝟭𝟵𝘅 𝗳𝗮𝘀𝘁𝗲𝗿 𝗼𝘃𝗲𝗿𝗮𝗹𝗹

→ Tests won: 𝗢𝗦𝗢𝗡 𝟯𝟬, 𝗠𝗼𝗻𝗴𝗼𝗗𝗕 𝟯

At position 1? Noticeable impact.

At position 1000? 𝟱𝟮𝟵𝘅 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲.

Updating a 4MB document? 𝟲.𝟳𝘅 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲.

Know your binary formats. Know your access patterns. Know your costs.

The framework is open source. The methodology is documented. The results are reproducible.

Prove it to yourself.

═══════════════════════════════════════

Rick Houlihan has spent 30+ years in enterprise data architecture, including leadership roles at Amazon Web Services, MongoDB, and Oracle. He is currently Field CTO for JSON Duality at Oracle focused on developer adoption of Oracle's converged database platform.

═══════════════════════════════════════

#Oracle #MongoDB #BSON #OSON #JSON #Performance #DatabaseInternals #DataEngineering #OpenSource


Very strong analysis, and the results are exactly what you’d expect if you understand algorithmic complexity. This isn’t about “Oracle vs MongoDB” — it’s about O(1 vs O(n) at the core of the storage format design. BSON was created at a time when fast shipping and simplicity were the priority. But as document size and structural complexity grow, the hidden cost of sequential scans becomes unavoidable. What matters more than the 529x headline is that: • the results align with theory • the methodology is isolated and reproducible • and the comparison is technically fair The point about the Document abstraction hiding the O(n) cost is especially sharp — it often leads to “convenient” design choices whose impact only appears later. High-quality technical content 👏 This kind of analysis elevates the discussion from who’s faster to why it’s faster.

Hey Rick, you have me reliving my computer science 101 days. I’ll never forget the conversations when we compared the performance of binary searches with hash tables ;-)

“Horowitz as an engineer who'd been coding since age five” 5?? What in the world. Most children don’t even start reading or writing until 7. 5 years old, programming a computer, that’s wild, understanding syntax, logic, problem-solving through code at an age when you’re still mastering the alphabet.

Like
Reply

This is a great article, decribing in depth how Oracle does JSON. Thanks Rick! What it also means is that because OSON is more resource efficient, Oracle is cheaper than MongoDB Atlas!  Autonomous JSON AI database is 'the' Oracle database for JSON/Document users. It's even MongoDB compatible (we understand MongoDB wire protocol) so that users lift-and-shift existing MongoDB apps to Oracle. https://www.oracle.com/database/mongodb-migration/

To view or add a comment, sign in

More articles by Rick Houlihan

Others also viewed

Explore content categories