Boyer–Moore Algorithm: The Algorithm That Skips Ahead

Boyer–Moore Algorithm: The Algorithm That Skips Ahead

📘 Introduction

In the world of computer science, efficient string searching is a cornerstone of text processing, compilers, and even everyday applications like search engines or word processors. Among the many algorithms designed for this purpose, the Boyer–Moore Algorithm, introduced by Robert S. Boyer and J Strother Moore in 1977, stands out as one of the most efficient and widely adopted

🧠 The Idea Behind Boyer–Moore Algorithm

Unlike the naive approach that checks every character sequentially, Boyer–Moore leverages two powerful heuristics:

  • Bad Character Rule: If a mismatch occurs, the algorithm shifts the pattern to align with the last occurrence of the mismatched character in the pattern.
  • Good Suffix Rule: If a suffix of the pattern matches but a mismatch occurs earlier, the algorithm shifts the pattern to align with another occurrence of that suffix.

Together, these heuristics allow the algorithm to skip large portions of text, making it significantly faster in practice than many alternatives.

⚙️ Algorithmic Steps

Here’s how Boyer–Moore Algorithm works step-by-step:

  1. Preprocess the pattern to build bad character and good suffix tables.
  2. Align the pattern with the text and compare characters from right to left.
  3. On mismatch, use the precomputed tables to decide how far to shift the pattern.
  4. Repeat until the pattern is found or the text is exhausted.

💻 Boyer–Moore Algorithm Implementation

# Boyer-Moore String Search Algorithm Implementation 

# Step 1: Preprocess for Bad Character Rule
def bad_char_heuristic(pattern):
    """
    Creates a dictionary storing the last occurrence of each character in the pattern.
    Used for the Bad Character Rule.
    """
    bad_char = {}
    for i in range(len(pattern)):
        bad_char[pattern[i]] = i
    return bad_char


# Step 2: Boyer-Moore Search Function
def boyer_moore_search(text, pattern):
    """
    Searches for occurrences of 'pattern' in 'text' using Boyer-Moore algorithm.
    Returns a list of starting indices where pattern is found.
    """
    m = len(pattern)
    n = len(text)
    bad_char = bad_char_heuristic(pattern)  # Preprocess pattern

    result = []  # Stores indices of matches
    s = 0  # Shift of the pattern with respect to text

    while s <= n - m:
        j = m - 1

        # Compare pattern with text from right to left
        while j >= 0 and pattern[j] == text[s + j]:
            j -= 1

        # If pattern is found
        if j < 0:
            result.append(s)
            # Shift pattern so next character in text aligns with last occurrence in pattern
            s += (m - bad_char.get(text[s + m], -1)) if s + m < n else 1
        else:
            # Shift pattern using Bad Character Rule
            s += max(1, j - bad_char.get(text[s + j], -1))

    return result


# Step 3: Example
text = "ABAAABCDABCABCDABCD"
pattern = "ABCD"

matches = boyer_moore_search(text, pattern)

if matches:
     print("Pattern found at indices:", matches)
else:
     print("Pattern not found.")

# Output
# Pattern found at indices: [4, 11, 15]        

📊 Time & Space Complexity

  • Best Case: O(n/m) — when mismatches allow large jumps.
  • Average Case: Sublinear, often faster than other algorithms in practice.
  • Worst Case: O(mn), though rare.
  • Space Complexity: O(k+m), where k is the alphabet size and m is the pattern length

⚖️ Strengths & Weaknesses

✅ Strengths:

  • Efficient in practice, especially with large alphabets.
  • Skips unnecessary comparisons using heuristics.
  • Widely used in text editors, search engines, and bioinformatics.

❌ Weaknesses:

  • Preprocessing overhead for small patterns.
  • Worst-case performance can degrade to O(mn).
  • More complex to implement compared to simpler algorithms like Naive or Rabin–Karp.

🔑 Key Takeaways

  • Boyer–Moore is a landmark algorithm in string searching, combining clever heuristics to achieve practical efficiency.
  • It’s best suited for applications where large texts and complex patterns are common.
  • While not always optimal in worst-case scenarios, its average-case performance makes it a go-to choice in real-world systems.

To view or add a comment, sign in

More articles by Dharmendra Sharma

Others also viewed

Explore content categories