Knuth-Morris-Pratt (KMP) Algorithm

Knuth-Morris-Pratt (KMP) Algorithm

The Knuth-Morris-Pratt (KMP) algorithm is a pattern-matching algorithm that efficiently searches for occurrences of a pattern within a larger string of text. It uses a pre-processing step to build a "Pi table" that stores the length of the longest proper prefix of the pattern that is also a suffix of each prefix of the pattern. During the matching phase, the algorithm compares the characters of the pattern with the characters of the text, and uses the Pi table to avoid unnecessary backtracking. The KMP algorithm has a time complexity of O(m+n), and is widely used in applications such as text editors, search engines, and bioinformatics.

Code:

import java.util.*


class HelloWorld {
    public static void main(String[] args) {
        // declare variables
        int i = 0;
        int j = 0;
        int g = 0;
        int h = 0;
        int k = 0;
        int c = 2;
        int d = 2;
        String x;
        String y;
        
        // get input from user
        System.out.println("Enter the String: ");
        Scanner sc= new Scanner(System.in);
        String str= sc.nextLine();
        
        System.out.println("Enter the Pattern: ");
        Scanner cs= new Scanner(System.in);
        String ptrn= cs.nextLine();
        
        // convert input strings to character arrays
        char[] ch1 = str.toCharArray();
        char[] ch2 = ptrn.toCharArray();
        
        // get length of character arrays
        int n = ch1.length;
        int m = ch2.length;
        int o = m;
        
        // initialize Pi table
        int PT[]=new int[m];
        
        // compute Pi table using while loop
        while (g < m-1){
            if (ptrn.substring(0,1).equals(ptrn.substring(g+1,g+2))){
                while ((h < m) && (g < m-1) && (ptrn.substring(0+h,1+h).equals(ptrn.substring(g+1,g+2)))){
                    PT[g+1] = h + 1;
                    g++;
                    h ++;
                }
                h = 0;
            }
            if ((h < m) && (g < m-1) && (ptrn.substring(0,1).equals(ptrn.substring(g+1,g+2)))){
                g = g;
            }
            else {
                g++;
            }
        }
        
        // print Pi table
        System.out.println("Pi Table:");
        for(int z=0;z<PT.length;z++){
            System.out.print(ch2[z] + " ");
        }
        System.out.print("\n");
        for(int z=0;z<PT.length;z++){
            System.out.print(PT[z] + " ");
        }
        System.out.print("\n");
        
        // perform pattern matching using KMP algorithm
        while (i < n){
            if (i < n && j < m && (ch1[i] == ch2[j])){
                while (i < n && j < m && (ch1[i] == ch2[j])){
                    i++;
                    j++;
                }
            }
            if (j == m){
                k++;
                System.out.print("Pattern found in the string at the index = ");
                System.out.println(i-j);
            }
            if (j != 0){
                j = PT[j-1];
            }
            if (i < n && j < m && (ch1[i] != ch2[j])){
                i++;
            }
        }
        
        // print result
        if (k == 0){
            System.out.print("Pattern not found in the string");
        }
    }
}        

The above code is a Java implementation of the Knuth-Morris-Pratt (KMP) algorithm, which is a pattern-matching algorithm that searches for occurrences of a pattern within a larger string of text.

The code starts by declaring and initializing variables for the indices i, j, g, h, k, c, d, and o, and the strings x and y. It then prompts the user to enter the string and pattern to be searched for, and reads in the input using Scanner objects. The input strings are converted to character arrays using the toCharArray() method.

The code then creates an array PT to store the "Pi table," which is used by the KMP algorithm to avoid unnecessary backtracking. The Pi table is computed using a while loop that iterates over the characters of the pattern. For each character, the algorithm checks if it is equal to the first character of the pattern. If it is, it then checks if the subsequent characters match, and sets the value of PT accordingly.

After computing the Pi table, the algorithm performs the matching phase using another while loop that iterates over the characters of the text. For each character, the algorithm checks if it matches the current character of the pattern. If it does, it continues to compare subsequent characters until a complete match is found. If a match is found, the algorithm prints the index at which the pattern was found. If a mismatch occurs, the algorithm uses the Pi table to determine how much to backtrack in the pattern before continuing to search for a match.

Finally, the algorithm outputs a message indicating whether or not the pattern was found in the string.

Overall, the code provides a functional implementation of the KMP algorithm for pattern matching in Java, using user input and the Pi table to efficiently search for occurrences of a pattern within a larger string of text.

This implementation of the KMP pattern matching algorithm is better than other naive implementations because it makes use of the Pi table. The Pi table is a pre-processing step that allows the algorithm to efficiently backtrack when it encounters a mismatch between the pattern and the text.

The Pi table is computed using a while loop that iterates over the characters of the pattern. For each character, the algorithm checks if it is equal to the first character of the pattern. If it is, it then checks if the subsequent characters match, and sets the value of PT accordingly. This step is important because it allows the algorithm to skip over sections of the pattern that have already been matched, rather than starting the search from the beginning of the pattern each time a mismatch is encountered.

Using the Pi table in this way greatly improves the efficiency of the KMP algorithm, particularly for large strings of text and/or patterns. This implementation also allows for user input, making it more flexible and adaptable to a variety of use cases.

Sample Output:

Enter the String:
ababcababcababcababd
Enter the Pattern: 
ababcababd

Pi Table:
a b a b c a b a b d 
0 0 1 2 0 1 2 3 4 0 

Pattern found in the string at the index = 10         

That is awesome Vifert Daniel! I appreciate your Effort, to learn and share. 👏

Like
Reply

To view or add a comment, sign in

More articles by Vifert Daniel

  • Radix Sort [Java]

    Radix sort is an integer sorting algorithm that sorts data with integer keys by grouping the keys by the individual…

    4 Comments

Others also viewed

Explore content categories