Rabin–Karp Algorithm: Efficient String Searching with Hashing

Rabin–Karp Algorithm: Efficient String Searching with Hashing

📘 Introduction

String matching is one of the most fundamental problems in computer science, with applications ranging from search engines to DNA sequencing. While brute-force approaches can be slow, the Rabin–Karp Algorithm introduces a clever use of hashing to make pattern searching more efficient.

🧠 The Idea Behind Rabin–Karp Algorithm

Instead of comparing the pattern with every substring character by character, Rabin–Karp converts both the pattern and substrings into hash values.

  • If the hash values match, only then a direct comparison is performed to confirm.
  • This reduces unnecessary comparisons and speeds up the search process.

Think of it like checking the fingerprint of text chunks before doing a full identity check.

⚙️ Algorithmic Steps

Here’s how Rabin–Karp Algorithm works step-by-step:

1.Compute the hash of the pattern.

2.Compute the hash of the first substring of the text (same length as the pattern).

3.Slide the window one character at a time:

  • Update the hash efficiently (rolling hash).
  • Compare with the pattern’s hash.
  • If hashes match, verify with direct comparison.

4.Continue until the end of the text.

💻 Rabin–Karp Algorithm Implementation

def rabin_karp(text, pattern, prime=101):
    """
    Rabin-Karp Algorithm for string matching.
    text: The main string where we search.
    pattern: The substring we want to find.
    prime: A prime number used for hashing (to reduce collisions).
    """

    m, n = len(pattern), len(text)   # m = length of pattern, n = length of text
    pattern_hash = 0                 # hash value for the pattern
    text_hash = 0                    # hash value for current window of text
    h = 1                            # multiplier for rolling hash

    d = 256  # number of possible characters (ASCII range)

    # Precompute h = pow(d, m-1) % prime
    # This helps in removing the leading character during rolling hash
    for i in range(m-1):
        h = (h * d) % prime

    # Initial hash values for pattern and first window of text
    for i in range(m):
        pattern_hash = (d * pattern_hash + ord(pattern[i])) % prime
        text_hash = (d * text_hash + ord(text[i])) % prime

    # Slide the pattern over text one by one
    for i in range(n - m + 1):
        # If hash values match, check characters one by one
        if pattern_hash == text_hash:
            if text[i:i+m] == pattern:
                print(f"Pattern found at index {i}")

        # Calculate hash for next window (rolling hash)
        if i < n - m:
            text_hash = (d * (text_hash - ord(text[i]) * h) + ord(text[i+m])) % prime

            # Python can give negative values, so adjust
            if text_hash < 0:
                text_hash += prime


# Example
text1 = "ABCCDDAEFG"
pattern1 = "CDD"

rabin_karp(text1, pattern1)

# Output
# Pattern found at index 3        

📊 Time & Space Complexity

  • Best Case: O(n+m) (when hash mismatches are frequent)
  • Worst Case: O(n⋅m) (when many hash collisions occur)
  • Average Case: O(n+m)
  • Space Complexity: O(1) (only a few extra variables for hashing)

⚖️ Strengths & Weaknesses

✅ Strengths:

  • Efficient for multiple pattern searches.
  • Uses hashing to skip unnecessary comparisons.
  • Simple and elegant rolling hash mechanism.

❌ Weaknesses:

  • Hash collisions can degrade performance.
  • Requires careful choice of hash function and prime number.

🔑 Key Takeaways

  • Rabin–Karp is a hash-based string matching algorithm that shines when searching for multiple patterns.
  • It balances efficiency with simplicity, though hash collisions can be a drawback.
  • Understanding Rabin–Karp builds strong intuition for how hashing can optimize algorithms beyond just data storage.

To view or add a comment, sign in

More articles by Dharmendra Sharma

Explore content categories