We needed a Dictionary/Hashmap for an algorithm. One small change made it 3x faster.

We needed a Dictionary/Hashmap for an algorithm. One small change made it 3x faster.

we our team working on an algorithm that needed fast lookups.

Simple choice , use a Dictionary.

“It’s O(1), right?”

var dict = new Dictionary<string, Order>();
foreach (var order in ordersList)  // 50,000 items
{
    dict.Add(order.Id, order);
}

        

But here’s the thing , O(1) isn’t the full story.

Let me break down what’s actually happening inside.


The Ideal World

Imagine if a dictionary worked like this:

key → hash → that hash is the memory address → done

        

Your key hashes to a number. That number is exactly where your data lives in memory.

No searching. No collisions. True O(1).

But there’s a problem.

Hash functions return 32-bit, 64-bit, or more depending on the language. Even at 32-bit, that's 4 billion possible values.

If we reserve one memory slot for every possible hash, we’d need 32 GB of RAM just to create an empty dictionary.

And you might only store 100 items.

So instead, dictionaries do something clever.


The Real World: A Smaller Array

Rather than 4 billion slots, a dictionary allocates a small array , let’s say 16 slots.

This array is called buckets. Think of it like 16 boxes where your data can live.

Now we need to squeeze our huge hash into this small space:

hash % bucketCount = which bucket to use

hash: 1,894,726,153
buckets: 16
index: 1,894,726,153 % 16 = 9

        

Simple division. Remainder tells us the bucket.

16 slots instead of 4 billion. Memory problem solved.(In practice, .NET and Java use bit masking instead of modulo when buckets are powers of two faster, same result.)

But we’ve created a new problem.

When you squeeze 4 billion possibilities into 16 buckets, multiple keys will land in the same bucket.

This is called a collision.


Collisions: Two Keys, One Bucket

Two ways to handle this:

Chaining — each bucket holds a linked list. Colliding items chain together.

Open Addressing — bucket taken? Keep probing forward until you find an empty one.

Either way, you’re now walking through items to find the right one.

Few collisions ? Still fast.

All keys in one bucket ? That’s O(n) linear search.


The Hidden Cost: Resize

This is where performance silently dies.

Dictionaries start small , just a few buckets.

As items grow, they track a load factor (items ÷ buckets).

When it crosses ~70–75%(differs on languages), the structure needs more room. It resizes:

1. Allocate new array (roughly 2x bigger or in primary numbers)
2. Loop through EVERY existing item
3. Recalculate each: hash % newBucketCount
4. Reinsert into new positions

        

One .Add() just touched every item in the collection.

50,000 items without pre-sizing:

Start:     3 buckets
Resize 1:  7 buckets    → rehash all
Resize 2:  17 buckets   → rehash all
Resize 3:  37 buckets   → rehash all
...
Resize 15: 75,000+ buckets → rehash all

        

Could be 10-15 resizes depending on implementation. Each one rehashes everything. Tens of thousands of wasted operations.

With pre-sizing:

var dict = new Dictionary<string, Order>(capacity: 50000);

        

Zero resizes. 3x faster.


The Real Complexity

Scenario Time

Average case O(1)

Many collisions O(n)

Resize triggered O(n)

Bad hash function O(n)


Simple Takeaway

If you know the size . tell the dictionary.

C#:     new Dictionary<K,V>(capacity: count)
Java:   new HashMap<>(count)        

If you don’t know the cost, you’re already paying it.

To view or add a comment, sign in

More articles by Sanjay M E

Others also viewed

Explore content categories