Data structure (Java) KMP algorithm

1, Application scenario - string matching

1) There is a string str1= "BBC abcdab abcdabde ", and a substring str2="ABCDABD " 
2) Now it's time to judge str1 Whether it contains str2 , If present, return to the first occurrence , If not, return -1

 

2, Solution - violence matching

If we use the idea of violent matching and assume that str1 is now matched to position i and the substring str2 is matched to position j, then:

1) If the current character matches successfully (i.e str1[ i ] == str2[j] ), then i ++ , j++ , continue to match the next character
2) If mismatch (i.e str1[ i ]! = str2[j] ), order i = i - (j - 1) , j = 0 . It is equivalent to that each time a match fails, i to flash back , j Set as 0 .
3) There will be a lot of violence to flash back , move only at a time One , if it does not match, move to the next bit and then judge, which wastes a lot of time. ( infeasible !)
  public static int violenceMatch(String source, String target) {
        int i = 0, j = 0;
        char[] s1 = source.toCharArray();
        char[] s2 = target.toCharArray();
        //If any index is out of bounds, it will end. Generally, if i is out of bounds, it may not be found. If j is out of bounds, it is found
        while (i < source.length() && j < target.length()) {
            count++;
            //The first character is matched successfully, and then the following character is matched
            if (s1[i] == s2[j]) {
                i++;
                j++;
            } else {
                //Failed to match. Start matching again from the next position
                i = i - j + 1;
                j = 0;
            }
            //j points to the end of the target string. The match is successful
            if (j == target.length()) return i - j;
        }
        return -1;
    }

3, Solution KMP algorithm

1) KMP It is a classical algorithm to solve whether the pattern string has appeared in the text string, and if so, the earliest position
2) Knuth-Morris-Pratt String lookup algorithm , abbreviated as“ KMP Algorithm ", often used in a text string S Find a pattern string in P This algorithm is composed of Donald Knuth , Vaughan Pratt , James H. Morris Three people in 1977 Jointly published in, so this 3 The last name of the person named this algorithm .
3) KMP Method the algorithm uses the previously judged information to A next array , save before and after the mode string Longest common subsequence of length , at each backtrace, through next Array to find the previously matched position, which saves a lot of computing time.
In fact, the difference between kmp algorithm and violence algorithm is that violence algorithm only moves back one bit in each match, while kmp algorithm dynamically calculates the number of bits moved back in each match by using the next array. The next array takes advantage of the target string and gets it before matching.

   

3.1 prefix and suffix

3.2 next array and partial matching value:

3.3 examples

If we have already matched here, we can see that the next character does not match.

 

If the violence method is used, the next matching position should be the last position of the first matching position:

 

Using KMP algorithm, it is calculated that the next backward shift is 4 bits:

(number of matched characters (6) - partial matching value of the last matched character (2))

IV. code demonstration of KMP

1. First, get the next array. This is an online version. I don't understand it. I achieved the same result with other ideas in the main program

 //Get the online version of the next array. I can't understand it
    public static int[] Next1(String s) {
        int[] next = new int[s.length()];
        next[0] = 0;//The first element is 0 anyway, which does not match itself
        //Index backward
        for (int i = 1, k = 0; i < s.length(); i++) {
            while (k > 0 && s.charAt(k) != s.charAt(i)) k = next[k - 1];//
            if (s.charAt(k) == s.charAt(i)) k++;//If a match is found, make the matching length +1
            next[i] = k;//Assign k to the corresponding element of the next, that is, the matching length, regardless of whether it is matched each time
        }
        return next;
    }

2. main program

Note: the idea of next array:

      1. i points to the first bit of the string, and j points to the last bit of the first bit of the string. J moves backward continuously until i and j move backward at the same time when they encounter the same character as the first character. At this time, the matching length is recorded in the next array at the same time.

      2. Once a different character is encountered, i returns to the first place of the string, and j always moves back. Clear the matching length variable and wait for the next matching.

      3. If the matching is successful again, repeat the operation similar to 1 (i and j move back at the same time). Until J points to the end of the string.

public static Integer count=0;//Match count, compare the two methods
public static int KMPMatch(String source,String target){
         //1. Get the next array, which is the key of kmp algorithm, so that we can decide where to start matching next time
         int[] next = new int[target.length()];
         int matchedLength = 0, i = 0, j = 1;
         next[0] = 0;//The first element is 0 anyway, which does not match itself
         //i points to the first character and j moves back.
         while (j < target.length()) {
             //If the characters indicated by i and j match successfully, they will be moved back at the same time, and the length of the matched characters will be recorded at the same time
             if (target.charAt(i) == target.charAt(j)) {
                 next[j] = ++matchedLength;
                 i++;
                 //Otherwise, i will be moved to the first place and the matching length will be cleared
             } else {
                 i = 0;
                 matchedLength = 0;
             }
             //j always traverse backwards
             j++;
         }
         //2. Real matching process
         int k = 0, l = 0;//k points to the source string, l points to the destination string
         int matched=0;
         char[] s1 = source.toCharArray();
         char[] s2 = target.toCharArray();
         //If any index is out of bounds, it will end. Generally, if i is out of bounds, it may not be found. If j is out of bounds, it is found
         while (k < source.length() && l < target.length()) {
             count++;
             //The first character is matched successfully, and then the following character is matched
             if (s1[k] == s2[l]) {
                 k++;
                 l++;
                 matched++;

             } else {
                 //The matching is unsuccessful. The matching starts again from the back position, which has been optimized by kmp algorithm
                 if(matched>=1) k=k-l+matched-next[matched-1];//If the character length is not 0, select the position to start the comparison next time according to the algorithm
                 else k=k-l+1;//If the character length is 0, it will return to the next bit of the original position like the violence method
                 l = 0;
                 matched=0;
             }
             //l points to the end of the target string, and the match is successful
             if (l == target.length()) return k - l;
         }
         return -1;
     }

5, Final comparison

@Test
    public void testViolenceMatch() {
        String source = "BBC ABCDAB ABCDABCDABDE";
        String target = "ABCDABD";
        System.out.println(solution.violenceMatch(source, target));
        System.out.println("Matching times:"+solution.count);
    }



15
 Matching times: 36



    @Test
    public void testKMPMatch() {
        String source = "BBC ABCDAB ABCDABCDABDE";
        String target = "ABCDABD";
        System.out.println(solution.KMPMatch(source,target));
        System.out.println("Matching times:"+solution.count);
    }



15
 Matching times: 29

Tags: data structure

Posted by 486974 on Sat, 04 Jun 2022 01:37:40 +0530