# Regular expression matching problem

Author: Grey

Original link: Regular expression matching problem

## Question link

LeetCode 10 Regular Expression Matching

## Violent solution

Filter out invalid parameters first, such as:

Cannot have in s string And * characters,

In a p string, two * cannot be adjacent, and * cannot appear at the beginning of the P string.

In the above two cases, return false directly.

Next, we define recursive functions

```boolean process0(char[] s, char[] p, int si, int pi)
```

Recursion means: whether the p string starts from pi to the last character can match the s string from si to the last character.

If the recursive meaning is defined above, the main function call

```process0(s, p, 0, 0)
```

You can get the result.

The next step is base case,

Obviously, when pi reaches the end position, that is, when there is no matching string, si must also reach the end position to be a match, otherwise it is impossible to match

``` if (pi == p.length) {
return si == s.length;
}
```

If there is no end position, the description:

pi is not over yet

If si comes to the end at this time, the string after pi must be digested into an empty string to match. To make pi and its later digested into an empty string, pi and its later must be an even length string, because if it is an odd length string, it will not become an empty string in any case. In addition to pi and subsequent strings that must conform to even length, pi and subsequent strings must meet several:

```Valid characters+'*'
```

Only in this way can valid characters be * digested into empty strings.

```        // pi is not over yet
if (si == s.length) {
// si is over
if (((p.length - pi) & 1) == 1) {
// pi and subsequent characters must first be even numbers, and the remaining odd numbers will become empty strings.
return false;
}
if (pi + 1 < p.length && p[pi + 1] == '*') {
// Must be followed by a combination of: valid characters + "*"
return process0(s, p, si, pi + 2);
}
return false;
}
```

If the above if conditions are not met, it indicates:

Neither si nor pi

At this time, if PI reaches the last position of a valid character, or if the next position of PI is not *, p[pi] must face s[si] independently. At this time, if you want to match up, you must first meet

```s[si] == p[pi] || p[pi] == '.'
```

Secondly, the s string can be matched from si+1 to the last by the p string from pi+1 to the last. The logic is as follows:

```        // Neither si nor pi ends
if (pi == p.length - 1 || p[pi + 1] != '*') {
return (s[si] == p[pi] || p[pi] == '.') && process0(s, p, si + 1, pi + 1);
}
```

If you have also escaped the above judgment, it indicates that:

PI is not the last position, and p[pi+1] = = '*'

Then p[pi] and p[pi+1] can at least be resolved into empty strings first, that is, the position of p[pi] is not matched. The logic is as follows:

```// P[pi] and p[pi+1] are first resolved into empty strings, and the si is directly matched to the pi+2 position.
if (process0(s, p, si, pi + 2)) {
return true;
}
```

If you skip this step, it means that the path of resolving p[pi] into an empty string is not feasible, so you can only make: p[pi] match s[si], and then derive the * at the position of p[pi+1]:

0 p[pi]

1 p[pi]

2 p[pi]

......

n p[pi]

Then match the rest of the s string.

```        for (int i = si; i < s.length; i++) {
if (p[pi] == s[i] || p[pi] == '.') {
// P[pi] is matched with s[si], and then n p[pi] are derived from * at p[pi+1] position for matching
// Directly return true as long as the match is made
if (process0(s, p, i + 1, pi + 2)) {
return true;
}
} else {
// p[pi] does not match the previous s[si], and returns false directly
return false;
}
}
```

The complete code is as follows:

```    public static boolean isMatch0(String s, String p) {
if (s == null || p == null) {
return false;
}
char[] str = s.toCharArray();
char[] pStr = p.toCharArray();
return isValid(str, pStr) && process0(str, pStr, 0, 0);
}

// Filter out invalid characters first
private static boolean isValid(char[] str, char[] exp) {
for (char c : str) {
// str string cannot have And*
if (c == '.' || c == '*') {
return false;
}
}
for (int i = 0; i < exp.length; i++) {
// *Cannot be in the first position of exp
// Two * cannot be connected together
if (exp[i] == '*' && (i == 0 || exp[i - 1] == '*')) {
return false;
}
}
return true;
}

private static boolean process0(char[] s, char[] p, int si, int pi) {
if (pi == p.length) {
return si == s.length;
}
// pi is not over yet
if (si == s.length) {
// si is over
if (((p.length - pi) & 1) == 1) {
// pi and subsequent characters must first be even numbers, and the remaining odd numbers will become empty strings.
return false;
}
if (pi + 1 < p.length && p[pi + 1] == '*') {
// Must be followed by a combination of: valid characters + "*"
return process0(s, p, si, pi + 2);
}
return false;
}
// Neither si nor pi ends
if (pi == p.length - 1 || p[pi + 1] != '*') {
return (s[si] == p[pi] || p[pi] == '.') && process0(s, p, si + 1, pi + 1);
}
// PI is not the last position, and p[pi+1] = = '*'
// P[pi] and p[pi+1] can at least be resolved into empty strings first, that is, the position of p[pi] does not match
if (process0(s, p, si, pi + 2)) {
return true;
}
// If we get to this point, p[pi] will be reduced to an empty string, which will not work
// So only: p[pi] can match s[si]
// Then derive '*' of p[pi+1]:
// 0 p[pi]
// 1 p[pi]
// 2 p[pi]
// ...
// n p[pi]
for (int i = si; i < s.length; i++) {
if (p[pi] == s[i] || p[pi] == '.') {
if (process0(s, p, i + 1, pi + 2)) {
return true;
}
} else {
break;
}
}
return false;
}
```

## Dynamic programming solution

In the violence method, there are two variable parameters of the recursive function, and they are simple parameters, so they can be changed to two-dimensional dynamic programming. We have sorted out the possibility and lattice dependency by ourselves. I have sorted it out twice by myself. The complete code is:

```    public static boolean isMatch2(String s, String p) {
if (s == null || p == null) {
return false;
}
char[] str = s.toCharArray();
char[] pStr = p.toCharArray();
if (!isValid(str, pStr)) {
return false;
}
boolean[][] dp = new boolean[str.length + 1][pStr.length + 1];
// The last column, except dp[str.length][pStr.length] = true
// The rest are false
dp[str.length][pStr.length] = true;
// Last line
dp[str.length][pStr.length - 1] = false;
for (int i = pStr.length - 2; i >= 0; i--) {
if (((pStr.length - i) & 1) == 1) {
dp[str.length][i] = false;
} else if (i + 1 < pStr.length && pStr[i + 1] == '*') {
dp[str.length][i] = dp[str.length][i + 2];
} else {
dp[str.length][i] = false;
}
}
// Penultimate column
for (int i = str.length - 1; i >= 0; i--) {
dp[i][pStr.length - 1] = ((str[i] == pStr[pStr.length - 1] || pStr[pStr.length - 1] == '.') && dp[i + 1][pStr.length]);
}
for (int i = str.length - 1; i>=0;i--) {
for (int j = pStr.length - 2; j >=0;j--) {
if (pStr[j+1]!='*') {
dp[i][j] = (str[i] == pStr[j] || pStr[j] == '.') && dp[i + 1][j + 1] ;
} else if (dp[i][j+2]) {
dp[i][j] = true;
} else {
for (int k = i; k < str.length; k++) {
if (pStr[j] == str[k] || pStr[j] == '.') {
if (dp[k + 1][j+2]) {
dp[i][j]= true;
break;
}
} else {
dp[i][j] = false;
break;
}
}
}
}
}
return dp[0][0];
}
// Filter out invalid characters first
private static boolean isValid(char[] str, char[] exp) {
for (char c : str) {
if (c == '.' || c == '*') {
return false;
}
}
for (int i = 0; i < exp.length; i++) {
if (exp[i] == '*' && (i == 0 || exp[i - 1] == '*')) {
return false;
}
}
return true;
}
```

## Enumeration behavior optimization

Through the dynamic programming solution, a place that can be optimized is found. If the following for loop in the dynamic programming solution can be omitted, the efficiency of the algorithm can be much higher,

```                    for (int k = i; k < str.length; k++) {
if (pStr[j] == str[k] || pStr[j] == '.') {
if (dp[k + 1][j+2]) {
dp[i][j]= true;
break;
}
} else {
// dp[i][j] = false;
break;
}
}
```

Through analysis, it can be concluded that for a common position (i, j), the above for loop actually depends on:

Dependency (i+1, j+2), (i+2, j+2), (i+3, j+2)

When we were seeking (i+1, j), we relied on the following positions:

(i+2，j+2)，(i+3，j+2)......

Therefore, the position of (I, J) can be derived from (i+1, J) and (i+1, j+2). The above for loop is simplified, and the optimized code is as follows:

```    public static boolean isMatch3(String s, String p) {
if (s == null || p == null) {
return false;
}
char[] str = s.toCharArray();
char[] pStr = p.toCharArray();
if (!isValid(str, pStr)) {
return false;
}
boolean[][] dp = new boolean[str.length + 1][pStr.length + 1];
// The last column, except dp[str.length][pStr.length] = true
// The rest are false
dp[str.length][pStr.length] = true;
// Last line
dp[str.length][pStr.length - 1] = false;
for (int i = pStr.length - 2; i >= 0; i--) {
if (((pStr.length - i) & 1) == 1) {
dp[str.length][i] = false;
} else if (i + 1 < pStr.length && pStr[i + 1] == '*') {
dp[str.length][i] = dp[str.length][i + 2];
} else {
dp[str.length][i] = false;
}
}
// Penultimate column
for (int i = str.length - 1; i >= 0; i--) {
dp[i][pStr.length - 1] = ((str[i] == pStr[pStr.length - 1] || pStr[pStr.length - 1] == '.') && dp[i + 1][pStr.length]);
}
for (int i = str.length - 1; i >= 0; i--) {
for (int j = pStr.length - 2; j >= 0; j--) {
if (pStr[j + 1] != '*') {
dp[i][j] = (str[i] == pStr[j] || pStr[j] == '.') && dp[i + 1][j + 1];
} else if (dp[i][j + 2]) {
dp[i][j] = true;
} else if ((pStr[j] == str[i] || pStr[j] == '.') && (dp[i + 1][j + 2] || dp[i + 1][j])) {
dp[i][j] = true;
}
}
}
return dp[0][0];
}

// Filter out invalid characters first
private static boolean isValid(char[] str, char[] exp) {
for (char c : str) {
if (c == '.' || c == '*') {
return false;
}
}
for (int i = 0; i < exp.length; i++) {
if (exp[i] == '*' && (i == 0 || exp[i - 1] == '*')) {
return false;
}
}
return true;
}
```

## more

Algorithm and data structure notes

Posted by Tracekill on Tue, 31 May 2022 20:21:37 +0530