# KMP Algorithm

Difficulty Level Hard
Pattern Searching StringViews 3013

KMP(Knuth-Morris-Pratt) algorithm is used for pattern searching in a given string. We are given a string S and a pattern p, our goal is to determine whether or not the given pattern is present in the string.

Input:
S = “aaaab”
p = “aab”
Output:
true

## Naive Approach

The naive approach for pattern searching is to match the given pattern character by character in the given string.

### Example

S= “aaaab”            ->Match
p= “aab”

S= “aaaab”           ->Match
p= “aab”

S= “aaaab”           ->Mismatch
p= “aab

S= “aaaab”           ->Match
p= “aab”

S= “aaaab”           ->Match
p= “aab”

S = “aaaab”           ->Mismatch
p = “aab

S = “aaaab”           ->Match
p = “aab”

S = “aaaab”           ->Match
p = “aab”

S = “aaaab”           ->Match
p = “aab

Pattern p is found in S.
In the above approach we are reverting back our progress in String S on encountering a mismatch, so the time complexity of the naive pattern matching algorithm is O(length of S x length of p).

## KMP Algorithm

The idea of KMP algorithm is to save the progress and eliminate the reverting back in the main String(S), it is achieved by pre-processing the given pattern(p).
The algorithm is similar to the naive approach, we start searching for pattern_p in String S, character by character, and if we encounter a mismatch, then instead of reverting back in the main string S, we revert back our position in pattern p, to a point, where the suffix is also a prefix in the matched part of the pattern.

### Example

S= “abcaxabcab”           ->Match
p= “abcab”

S= “abcaxabcab”           ->Match
p= “abcab”

S= “abcaxabcab”           ->Match
p= “abcab”

S= “abcaxabcab”           ->Match
p = “abcab”

S= “abcaxabcab”           ->Mismatch
p = “abcab

At this point the matched part of pattern_p in the string s is “abca”, in this matched part “a” at index 4 is a suffix that is also a prefix(“a” at index 1), so we revert back in p to the character b(next to the prefix “a”) and continue matching.

S= “abcaxabcab”           ->Mismatch
p = “abcab”

The matched part of patterns

p in string S is “a”, there is no suffix that is also a prefix, so we revert back to the first character in p and continue matching using KMP Algorithm.

S= “abcaxabcab”           ->Mismatch
p = “abcab”

The mismatch occurred at index 1, so we advance in the String S.

S = “abcaxabcab”           (Match)
p = “abcab”

S = “abcaxabcab”           (Match)
p = “abcab”

S = “abcaxabcab”           (Match)
p = “abcab”

S = “abcaxabcab”           (Match)
p = “abcab”

S = “abcaxabcab”           (Match)
p = “abcab We have found p in S.

### Algorithm

1. Create an array named table, of size equals to the length of pattern_p, table[i] stores the length of the longest suffix that is also a prefix from index 1 to i.
2. Initialize table = 0, because for index 0, there is no suffix which is also a prefix
3. Initialize i = 0 and j = 1, run a loop while j < n(length of pattern_p), repeat steps 4 and 5
4. If p[i] and p[j] matches, table[j] = i + 1, increment i and j
5. If p[i] and p[j] mismatches, and i is not equals to 0, i = table[i – 1], and if i = 0, then table[j] = 0 and increment j

### Pseudo Code for KMP Algorithm

```table = 0
i = 0, j = 1
while (j < n) { // n is the length of pattern p
if (p[i] == p[j]) {
table[j] = i + 1;
i++;
j++;
} else {
if (i != 0) {
i = table[i - 1];
// Do not increment j here
} else {
table[j] = 0;
j++;
}
}
}```

Examples

pattern : “aaab”
table[] = {0, 1, 2, 0}

table = {0, 0, 0, 0, 0, 1, 2, 3, 0}

### Algorithm

1. Pre-process the pattern to create the prefix-suffix table array as mentioned above.
2. We start matching the given string and pattern starting from the first character and keep incrementing in both the strings till we are matching the characters if we matched all the characters, the pattern is found and we stop.
3. As we see a mismatch, let mismatch occurs at index i in the pattern, we revert back to table[i – 1] index in the pattern or at index 0 if the mismatch occurs at index 0.
4. Repeat steps 2 and 3, till we traverse the whole string or we found the pattern.

### JAVA Code for KMP Algorithm

```public class KMPAlgorithm {
// Function to pre-process the pattern and to create the suffix-prefix table
private static void preProcess(char[] p, int[] table, int n) {
// Initialize table as 0
table = 0;
// Initialise i = 0 and j = 1
int i = 0, j = 1;
// Run a loop till length of pattern, if p[i] and p[j] matches table[j] = i + 1 else table[j] = 0
while (j < n) {
if (p[j] == p[i]) {
// Match found set table[i] = i + 1, and advance i and j
table[j] = i + 1;
i++; j++;
} else {
if (i != 0) {
i = table[i - 1];
// We do not increment j here
} else {
table[j] = 0;
j++;
}
}
}
}

public static void main(String[] args) {
char S[] = "abcaabcab".toCharArray();
char p[] = "abcab".toCharArray();

// Pre-process the pattern
int table[] = new int[p.length];
preProcess(p, table, p.length);

// Searching the pattern in the given string
int i = 0; // index for string
int j = 0; // index for pattern
boolean found = false; // Boolean variable to indicate that the pattern is found or not

while (i < S.length) {
if (S[i] == p[j]) {
// Advance forward the characters are same
i++;
j++;
} else {
// Characters are not same, revert back the progress in the pattern
if (j !=0) // Reset the position of j in the pattern, do no advance in string
j = table[j - 1];
else // first character is mis-matched, advance in the string
i++;
}

if (j == p.length) {
// The pattern is found in the string
found = true;
break;
}
}

if (found)
System.out.println("true");
else
System.out.println("false");
}
}```
`true`

### C++ Code for KMP Algorithm

```#include <bits/stdc++.h>
using namespace std;
void preProcess(char *p, int *table, int n) {
// Initialize table as 0
table = 0;
// Initialise i = 0 and j = 1
int i = 0, j = 1;
// Run a loop till length of pattern, if p[i] and p[j] matches table[j] = i + 1 else table[j] = 0
while (j < n) {
if (p[i] == p[j]) {
// Match found set table[i] = i + 1, and advance i and j
table[j] = i + 1;
i++;
j++;
} else {
if (i != 0) {
i = table[i - 1];
// We do not increment j here
} else {
table[j] = 0;
j++;
}
}
}
}

int main() {
char s[] = "abcaabcab";
char p[] = "abcab";
int n = strlen(s);
int m = strlen(p);

// Pre-process the pattern
int table[m];
preProcess(p, table, m);

// Searching the pattern in the given string
int i = 0;      // index for string
int j = 0;      // index for pattern
bool found = false; // Boolean variable to indicate that the pattern is found or not
while (i < n) {
if (s[i] == p[j]) {
// Advance forward the characters are same
i++;
j++;
} else {
// Characters are not same, revert back the progress in the pattern
if (j != 0)  {
// Reset the position of j in the pattern, do no advance in string
j = table[j - 1];
} else {
// first character is mis-matched, advance in the string
i++;
}
}

if (j == m) {
// The pattern is found in the string
found = true;
break;
}
}
if (found)
cout<<"true"<<endl;
else
cout<<"false"<<endl;

return 0;
}```
`true`

### Complexity Analysis for KMP Algorithm

In KMP algorithm there is no reverting back in the String S, so the time complexity is of the order of length of S and it is also proportional to the length of pattern_p, for preprocessing, therefore,

Time Complexity = O(length of String S + length of Pattern p)

References

Translate »