🔥 Internal Working of HashMap in Java — Deep Dive (Source-Level + Real-World)

🔥 Internal Working of HashMap in Java — Deep Dive (Source-Level + Real-World)

Most developers know that HashMap gives O(1) performance. Very few know why it sometimes suddenly doesn’t.

And that gap is exactly where production outages happen.

Let’s break HashMap from memory layout → hashing → collisions → resizing → Java 8 changes → concurrency issues → GC impact.

1️⃣ What HashMap Really Is (Not the textbook version)

At runtime, a HashMap is:

An array of references (Node[])
→ each index is a bucket
→ each bucket holds:
   - null
   - single Node
   - linked list of Nodes
   - red-black tree of Nodes (Java 8+)
        

It is not just a map. It is a carefully optimized memory + CPU trade-off.

2️⃣ Internal Node Structure (JDK Source Reality)

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    Node<K,V> next;
}
        

Important details:

  • hash is cached → hashCode not recalculated
  • next enables chaining
  • TreeNode extends Node (Java 8+)

👉 Every entry is an object allocation → memory & GC matter.


3️⃣ Hashing: Why hashCode() Is Not Enough

Step 1: hashCode() is called

int h = key.hashCode();
        

Step 2: Bit spreading (CRITICAL)

h ^ (h >>> 16)
        

Why?

  • hashCode implementations often use lower bits poorly
  • HashMap uses lower bits to calculate index
  • This XOR spreads entropy

👉 This single line prevents massive collision disasters


4️⃣ Index Calculation (Why power of 2?)

index = (n - 1) & hash
        

Why NOT %?

  • % is slower
  • & is one CPU instruction
  • Works only when n is power of 2

This is why:

16, 32, 64, 128 ...
        

are used as capacities.

5️⃣ put() — Complete Internal Flow

map.put(key, value);
        

What actually happens:

  1. Compute hash
  2. Calculate bucket index
  3. Check bucket:
  4. Increment size
  5. Check resize condition

⚠️ Every step has performance implications

6️⃣ Collision Handling — Where Things Go Wrong

Article content
Article content

Before Java 8

Bucket → Linked List
Worst case: O(n)
        

Java 8+

Bucket → Linked List → Red-Black Tree
Worst case: O(log n)
        

Treeification Conditions (Very Specific)

  • Bucket size ≥ 8
  • Table size ≥ 64

If table size < 64: 👉 HashMap resizes instead of treeifying

Why?

  • Tree overhead is expensive for small maps


7️⃣ get() — Fast or Slow Depends on You

map.get(key);
        

Internal flow:

  1. hash
  2. index
  3. bucket traversal:

💥 Bad equals() = slow map 💥 Mutable key = broken map

8️⃣ equals() + hashCode(): The Silent Killers

Production Bug Example

class Employee {
    String empId;
}
        

Used as HashMap key.

Problems:

  • hashCode not overridden → same hash
  • equals not overridden → identity comparison

Result:

  • All entries land in same bucket
  • Linked list grows
  • CPU spikes
  • Latency explodes

Correct Implementation

@Override
public int hashCode() {
    return Objects.hash(empId);
}

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (!(o instanceof Employee)) return false;
    Employee e = (Employee) o;
    return Objects.equals(empId, e.empId);
}
        

9️⃣ Resizing (Rehashing) — The Hidden Performance Bomb

Defaults:

initialCapacity = 16
loadFactor = 0.75
        

Resize happens when:

size > capacity × loadFactor
        

What happens internally:

  • New array (double size)
  • Every node is repositioned
  • No re-hashCode call, but index recalculation
  • CPU intensive
  • GC pressure increases

⚠️ In high-QPS systems, resize = latency spike

🔟 Why HashMap Is NOT Thread-Safe

Concurrent put operations can cause:

  • Lost updates
  • Infinite loops (JDK 7)
  • Data corruption

Famous JDK 7 Bug

Concurrent resize → linked list cycle → CPU stuck at 100%

👉 This is why ConcurrentHashMap exists

1️⃣1️⃣ Memory & GC Impact (Rarely Discussed)

Each HashMap entry:

  • Node object
  • Key object
  • Value object
  • Next pointer

Large maps:

  • Increase old-gen usage
  • Increase GC pause time
  • Affect latency SLAs

💡 Solution:

  • Use primitive maps (FastUtil, HPPC) when needed
  • Pre-size HashMap
  • Avoid long-lived giant maps

1️⃣2️⃣ Best Practices (Battle-Tested)

✔ Use immutable keys ✔ Override equals & hashCode ✔ Pre-size HashMap ✔ Avoid custom mutable objects as keys ✔ Never share HashMap across threads ✔ Use ConcurrentHashMap for concurrency ✔ Monitor resize & GC in production

1️⃣3️⃣ Interview Questions That Actually Matter

Why bucket size 8? → Tree cost vs list traversal trade-off

Why capacity ≥ 64 for treeification? → Avoid tree overhead for small maps

Why HashMap allows null key? → Null has hash 0, fixed bucket

Why HashMap iteration order changes? → Resize + bucket redistribution

🔚 Final Reality Check

HashMap is:

  • Simple to use
  • Dangerous to misuse
  • Extremely optimized internally

HashMap does not fail. Developers fail to understand HashMap.


To view or add a comment, sign in

More articles by Ram Lakhan Singh

Others also viewed

Explore content categories