Rabin–Karp Algorithm: Efficient String Searching with Hashing
📘 Introduction
String matching is one of the most fundamental problems in computer science, with applications ranging from search engines to DNA sequencing. While brute-force approaches can be slow, the Rabin–Karp Algorithm introduces a clever use of hashing to make pattern searching more efficient.
🧠 The Idea Behind Rabin–Karp Algorithm
Instead of comparing the pattern with every substring character by character, Rabin–Karp converts both the pattern and substrings into hash values.
Think of it like checking the fingerprint of text chunks before doing a full identity check.
⚙️ Algorithmic Steps
Here’s how Rabin–Karp Algorithm works step-by-step:
1.Compute the hash of the pattern.
2.Compute the hash of the first substring of the text (same length as the pattern).
3.Slide the window one character at a time:
4.Continue until the end of the text.
💻 Rabin–Karp Algorithm Implementation
def rabin_karp(text, pattern, prime=101):
"""
Rabin-Karp Algorithm for string matching.
text: The main string where we search.
pattern: The substring we want to find.
prime: A prime number used for hashing (to reduce collisions).
"""
m, n = len(pattern), len(text) # m = length of pattern, n = length of text
pattern_hash = 0 # hash value for the pattern
text_hash = 0 # hash value for current window of text
h = 1 # multiplier for rolling hash
d = 256 # number of possible characters (ASCII range)
# Precompute h = pow(d, m-1) % prime
# This helps in removing the leading character during rolling hash
for i in range(m-1):
h = (h * d) % prime
# Initial hash values for pattern and first window of text
for i in range(m):
pattern_hash = (d * pattern_hash + ord(pattern[i])) % prime
text_hash = (d * text_hash + ord(text[i])) % prime
# Slide the pattern over text one by one
for i in range(n - m + 1):
# If hash values match, check characters one by one
if pattern_hash == text_hash:
if text[i:i+m] == pattern:
print(f"Pattern found at index {i}")
# Calculate hash for next window (rolling hash)
if i < n - m:
text_hash = (d * (text_hash - ord(text[i]) * h) + ord(text[i+m])) % prime
# Python can give negative values, so adjust
if text_hash < 0:
text_hash += prime
# Example
text1 = "ABCCDDAEFG"
pattern1 = "CDD"
rabin_karp(text1, pattern1)
# Output
# Pattern found at index 3
📊 Time & Space Complexity
⚖️ Strengths & Weaknesses
✅ Strengths:
❌ Weaknesses:
🔑 Key Takeaways