How MySQL Implements Field split Function --- Famous SQL Practice Seeks Top 10 Tags

For table building statements, see the article:
SQL Exercise 1 - Famous Language SQL Exercise

The data is as follows:

What are the top 10 tags?
The SQL implementation is written as follows:

select tmp.tag,count(tag) amount from (
select q.id ,q.tags,ht.help_topic_id,
substring_index( substring_index(q.tags,'/',ht.help_topic_id+1) ,'/',-1) tag
from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1
) tmp group by tag order by amount desc limit 10;

Many people ask me why I write like this? What's the idea? Here's why I write like this
Steps to achieve:
First look at the labels for all the data

select tags from quote;

You can see that the data is long like this:

Idea: The tags field of each data contains multiple tags and is separated by'/'. Since tags are long like this, the top 10 tags must be sliced before the hot tags can be counted. The difficulty of this problem is how to split.

Steps to achieve:
1. Query labels for all data

select tags from quote;


2. Testing the substring_index function

select tags,substring_index(tags,'/',1) from quote ;

3. Look at mysql.help_topic and use the id column data of the system table mysql.help_top as a cyclic variable (because id is a continuous variable suitable for cycling; this one can build a similar table by itself)

select * from mysql.help_topic ht ;


4. Join quote tables with system tables to determine that tags field data for each data needs to be split into segments
length (q.tags) - length (replace(q.tags,'/',')+1: Used to determine through a delimiter that the current data needs to be cut into segments, and the ht.help_topic_id value just acts as a circular variable for each intercept (each tag)

select q.id ,q.tags,ht.help_topic_id
from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1;

5. Intercept each tag segment in turn by substring_index function and loop variable

select q.id ,q.tags,ht.help_topic_id,
substring_index(q.tags,'/',ht.help_topic_id+1)  tag
from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1;

6. Obtain each individual label by intercepting the label segment twice

select q.id,q.tags,ht.help_topic_id,substring_index(q.tags,'/',ht.help_topic_id+1),  
substring_index( substring_index(q.tags,'/',ht.help_topic_id+1) ,'/',-1) tag
from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1;

7. Final query statement:

select tmp.tag,count(tag) amount from (
select q.id ,q.tags,ht.help_topic_id,
substring_index( substring_index(q.tags,'/',ht.help_topic_id+1) ,'/',-1) tag
from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1
) tmp group by tag order by amount desc limit 10;


It might be a little hard to understand here. Again, if you do, skip it:
length (q.tags) - length (replace(q.tags,'/',')+1: Used to determine through a delimiter that the current data needs to be cut into segments, and the ht.help_topic_id value just acts as a circular variable for each intercept (each tag)
Looking at this diagram, you know what to expect. Also, because length (q.tags) - length (replace(q.tags,'/',')) calculates the number of separators ('/'), and the number of tags intercepted is one more than the separator, so it ends up with + 1

Tags: Database MySQL SQL

Posted by trex005 on Wed, 22 Sep 2021 22:35:05 +0530