Z-Algorithm: Fast String Matching for Modern Problems

Z-Algorithm: Fast String Matching for Modern Problems

📘 Introduction

String matching is one of the most fundamental problems in computer science, with applications ranging from search engines to DNA sequencing. Among the many algorithms designed to solve this efficiently, the Z-Algorithm stands out for its simplicity and linear-time performance.


🧠 The Idea Behind Z-Algorithm

The Z-Algorithm computes an array called the Z-array, where each entry Z[i] represents the length of the longest substring starting at position i that matches the prefix of the string.

👉 Example: For the string "aabxaabxcaabxaabx", the Z-array helps us quickly identify repeated patterns by comparing substrings with the prefix.

This clever trick allows us to solve pattern matching problems in O(n) time, where n is the length of the string.


⚙️ Algorithmic Steps

Here’s how Z Algorithm works step-by-step:

  1. Concatenate the pattern and the text with a special delimiter (e.g., pattern$text).
  2. Compute the Z-array for this combined string.
  3. Whenever a Z-value equals the length of the pattern, we’ve found an occurrence of the pattern in the text.


💻 Z Algorithm Implementation

def z_algorithm(s):
    n = len(s)
    Z = [0] * n   # Initialize Z-array with zeros
    l, r = 0, 0   # [l, r] marks the current Z-box (substring window)

    for i in range(1, n):
        # Case 1: i lies inside the current Z-box
        if i <= r:
            # Use previously computed values to avoid re-computation
            Z[i] = min(r - i + 1, Z[i - l])

        # Case 2: Try to extend Z[i] by comparing characters
        while i + Z[i] < n and s[Z[i]] == s[i + Z[i]]:
            Z[i] += 1

        # Update Z-box if the new substring extends beyond current [l, r]
        if i + Z[i] - 1 > r:
            l, r = i, i + Z[i] - 1

    return Z


# Example:
text = "aabxaabxcaabxaabx"
pattern = "aabx"
combined = pattern + "$" + text   # Concatenate pattern and text with a delimiter
Z = z_algorithm(combined)

# Scan Z-array to find matches
for i in range(len(Z)):
    if Z[i] == len(pattern):   # Match found when Z-value equals pattern length
        print("Pattern found at index:", i - len(pattern) - 1)


# Output
# Pattern found at index: 0
# Pattern found at index: 4
# Pattern found at index: 9
# Pattern found at index: 13        


📊 Time & Space Complexity

  • Time Complexity: O(n) linear time for computing the Z-array.
  • Space Complexity: O(n) requires storage of the Z-array.


⚖️ Strengths & Weaknesses

✅ Strengths:

  • Linear time efficiency: Handles large strings quickly.
  • Simplicity: Easy to implement compared to other advanced string algorithms.
  • Versatility: Useful for multiple problems like substring search, palindrome detection, and compression.

❌ Weaknesses:

  • Extra space usage: Requires an auxiliary Z-array.
  • Less intuitive: Understanding the Z-box concept can be tricky for beginners.


🔑 Key Takeaways

  • The Z-Algorithm is a powerful tool for string matching in O(n) time.
  • It leverages the Z-array to compare substrings with the prefix efficiently.
  • Despite requiring extra space, its speed and versatility make it a go-to choice in many applications.

To view or add a comment, sign in

More articles by Dharmendra Sharma

Explore content categories