Z-Algorithm: Fast String Matching for Modern Problems
📘 Introduction
String matching is one of the most fundamental problems in computer science, with applications ranging from search engines to DNA sequencing. Among the many algorithms designed to solve this efficiently, the Z-Algorithm stands out for its simplicity and linear-time performance.
🧠 The Idea Behind Z-Algorithm
The Z-Algorithm computes an array called the Z-array, where each entry Z[i] represents the length of the longest substring starting at position i that matches the prefix of the string.
👉 Example: For the string "aabxaabxcaabxaabx", the Z-array helps us quickly identify repeated patterns by comparing substrings with the prefix.
This clever trick allows us to solve pattern matching problems in O(n) time, where n is the length of the string.
⚙️ Algorithmic Steps
Here’s how Z Algorithm works step-by-step:
💻 Z Algorithm Implementation
def z_algorithm(s):
n = len(s)
Z = [0] * n # Initialize Z-array with zeros
l, r = 0, 0 # [l, r] marks the current Z-box (substring window)
for i in range(1, n):
# Case 1: i lies inside the current Z-box
if i <= r:
# Use previously computed values to avoid re-computation
Z[i] = min(r - i + 1, Z[i - l])
# Case 2: Try to extend Z[i] by comparing characters
while i + Z[i] < n and s[Z[i]] == s[i + Z[i]]:
Z[i] += 1
# Update Z-box if the new substring extends beyond current [l, r]
if i + Z[i] - 1 > r:
l, r = i, i + Z[i] - 1
return Z
# Example:
text = "aabxaabxcaabxaabx"
pattern = "aabx"
combined = pattern + "$" + text # Concatenate pattern and text with a delimiter
Z = z_algorithm(combined)
# Scan Z-array to find matches
for i in range(len(Z)):
if Z[i] == len(pattern): # Match found when Z-value equals pattern length
print("Pattern found at index:", i - len(pattern) - 1)
# Output
# Pattern found at index: 0
# Pattern found at index: 4
# Pattern found at index: 9
# Pattern found at index: 13
📊 Time & Space Complexity
⚖️ Strengths & Weaknesses
✅ Strengths:
❌ Weaknesses:
🔑 Key Takeaways