P8306 [template] dictionary tree solution

After writing the solution for the afternoon, I finally found that I couldn't submit it, so I put it here

subject

Title Description

given n n n mode strings s 1 , s 2 , ... , s n s_1, s_2, \dots, s_n s1, s2,..., sn and q q q queries, each given a text string t i t_i ti, please answer s 1 ∼ s n s_1 \sim s_n s1 ∼ how many strings are there in sn s j s_j sj satisfied t i t_i ti yes s j s_j Prefix of sj.

A string t t t yes s s Prefix of s if and only if from s s Delete several (can be 0) consecutive characters at the end of s and t t t same.

The input string is sensitive to size. For example, the string Fusu is different from the string Fusu.

Input format

There are multiple groups of test data in the test point of this question sheet.

The first line of input is an integer, indicating the number of data groups T T T.

For each group of data, the format is as follows:
The first line is two integers, representing the number of pattern strings respectively n n n and the number of queries q q q.
next n n n lines, one string per line, representing a pattern string.
next q q Line q, a string per line, represents a query.

Output format

Output the answers of each test data in the order of input.
For each query, output an integer on a line to represent the answer.

Example #1

Sample input \1

3
3 3
fusufusu
fusu
anguei
fusu
anguei
kkksc
5 2
fusu
Fusu
AFakeFusu
afakefusu
fusuisnotfake
Fusu
fusu
1 1
998244353
9

Sample output \1

2
1
0
1
2
1

Tips

Data scale and agreement

For all test points, ensure that 1 ≤ T , n , q ≤ 1 0 5 1 \leq T, n, q\leq 10^5 1 ≤ T,n,q ≤ 105, and the total length of the input string does not exceed 3 × 1 0 6 3 \times 10^6 three × 106. The input string only contains upper and lower case letters and numbers, and does not contain empty strings.

Solution

This question is one with a low pass rate among the popularization / improvement questions.

First first (skip)

What is Trie tree?

"Trie" translates to "sort".

So -- "sort tree"?

Well, we call it "dictionary tree".

And what is it for?

What if I give you a bunch of strings and ask you if a string exists?

Violent Mei Jue............. (think too much).

Therefore, we need to use trie tree.

The following figure:

Trie tree is a tree.

Root node r o o t root root is empty, and all other child nodes contain a character.

Each node has some different child nodes.

In this way, if you want to find out whether the string "ecea" has ever appeared, you can only start from the root node and query each corresponding node in turn. If a node is queried and the corresponding child node is not found, it indicates that this string does not exist; If you can finish the query, this string exists.

The following figure:

In addition, we also need to mark the end of each string to determine whether there is a string that ends with this character.

Let's look at this problem again

given n n n mode strings s 1 , s 2 , ... , s n s_1, s_2, \dots, s_n s1, s2,..., sn and q q q queries, each given a text string t i t_i ti, please answer s 1 ∼ s n s_1 \sim s_n s1 ∼ how many strings are there in sn s j s_j sj satisfied t i t_i ti yes s j s_j Prefix of sj.

A string t t t yes s s Prefix of s if and only if from s s Delete several (can be 0) consecutive characters at the end of s and t t t same.

The input string is sensitive to size. For example, the string Fusu is different from the string Fusu.

The question requires a judgment prefix. Also mark + + +Insert + + +Query.

Let's talk more about it

1. Storage implementation of trie tree

  • s o n son The son array is used to store the child node information of each node, s o n [ i ] [ j ] son[i][j] son[i][j] indicates node i i i has child nodes j j j .

  • c n t cnt cnt array is used for marking, c n t [ i ] cnt[i] cnt[i] record to node i i The number of strings at the end of i.

  • i d x idx idx represents the total number of nodes, each time when inserting + + ++ ++ .

const int N = 1000010;
int son[3 * N][65], cnt[3 * N], idx;

2. Each child node of the trie tree contains only upper and lower case letters and numbers

The topic has requirements:

For all test points, ensure that 1 ≤ T , n , q ≤ 1 0 5 1 \leq T, n, q\leq 10^5 1 ≤ T,n,q ≤ 105, and the total length of the input string does not exceed 3 × 1 0 6 3 \times 10^6 three × 106. The input string only contains upper and lower case letters and numbers, and does not contain empty strings.

Therefore, write a function that converts characters into numeric values for convenient storage.

int ctoi(char c){
	int x;
	if('A' <= c && c <= 'Z') x = c - 'A';				//Capital letters correspond to 0 ~ 25
	else if('a' <= c && c <= 'z') x = c - 'a' + 26;		//Lower case letters correspond to 26 ~ 51, so subtract'a'and add 26
	else x = c - '0' + 52;								//The number corresponds to 52 ~ 61. Similarly, subtract'0'and add 52
	return x;
}

3. Insert string

Insert a string into the Trie tree and write a function.

void insert(char str[]){
	int p = 0;							//p is used to temporarily record the current node number
	for(int i = 0; str[i]; ++i){		//Loop every character in the string
		int u = ctoi(str[i]);			//u is the value of the child node
		if(!son[p][u]) son[p][u] = ++idx;	
        //If the current node does not have this child node, expand a new child node and add one to the total number of nodes as the number of the new node
		p = son[p][u];					//Record the current node number
		++cnt[p];						//Mark, no more details
	}
} 

4. Query string

Similar to inserting.

int query(char str[]){
	int p = 0;							//Same as p in insert
	for(int i = 0; str[i]; ++i){		//Same as insertion
		int u = ctoi(str[i]);			//Same as insertion
		if(!son[p][u]) return 0;		//If there is no corresponding child node, that is, it cannot be found, it directly returns 0
		p = son[p][u];					//Same as insertion
	}
	return cnt[p];
    //Returns the number of strings. Note here that if it is found but not marked, the returned cnt[p] is still 0
}

Last last (Code)

#include <iostream>
#include <cstdio>
#include <cstring>
#include <algorithm>

using namespace std;

const int N = 1000010;

char str[3 * N];
int son[3 * N][65],cnt[3 * N],idx;

int ctoi(char c){
	int x;
	if('A' <= c && c <= 'Z') x = c - 'A';
	else if('a' <= c && c <= 'z') x = c - 'a' + 26;
	else x = c - '0' + 52;
	return x;
}

void insert(char str[]){
	int p = 0;
	for(int i = 0; str[i]; ++i){
		int u = ctoi(str[i]);
		if(!son[p][u]) son[p][u] = ++idx;
		p = son[p][u];
		++cnt[p];
	}
} 

int query(char str[]){
	int p = 0;
	for(int i = 0; str[i]; ++i){
		int u = ctoi(str[i]);
		if(!son[p][u]) return 0;
		p = son[p][u];
	}
	return cnt[p];
}
int t;
int main(){
	scanf("%d",&t);
	while(t--){
		for(int i = 0; i <= idx; ++i)
            for(int j = 0; j <= 64; ++j)
                son[i][j] = 0;
        for(int i = 0; i <= idx; i++)
            cnt[i] = 0;
        //Empty the array before each group of data
		idx = 0;
		int n,q;
		scanf("%d%d", &n, &q);
		while(n--){
			scanf("%s", str);
			insert(str);//insert
		}
		while(q--){
			scanf("%s", str);
			printf("%d\n", query(str));//query
		}
	}
	return 0;
}

Is it really the last?

This is the first explanation of konjaku, which is very detailed (self-confidence).

Please support me.

Thanks

Tags: Algorithm C++ data structure

Posted by GESmithPhoto on Sun, 24 Jul 2022 21:55:35 +0530