For table building statements, see the article:
SQL Exercise 1 - Famous Language SQL Exercise
The data is as follows:
What are the top 10 tags?
The SQL implementation is written as follows:
select tmp.tag,count(tag) amount from ( select q.id ,q.tags,ht.help_topic_id, substring_index( substring_index(q.tags,'/',ht.help_topic_id+1) ,'/',-1) tag from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1 ) tmp group by tag order by amount desc limit 10;
Many people ask me why I write like this? What's the idea? Here's why I write like this
Steps to achieve:
First look at the labels for all the data
select tags from quote;
You can see that the data is long like this:
Idea: The tags field of each data contains multiple tags and is separated by'/'. Since tags are long like this, the top 10 tags must be sliced before the hot tags can be counted. The difficulty of this problem is how to split.
Steps to achieve:
1. Query labels for all data
select tags from quote;
2. Testing the substring_index function
select tags,substring_index(tags,'/',1) from quote ;
3. Look at mysql.help_topic and use the id column data of the system table mysql.help_top as a cyclic variable (because id is a continuous variable suitable for cycling; this one can build a similar table by itself)
select * from mysql.help_topic ht ;
4. Join quote tables with system tables to determine that tags field data for each data needs to be split into segments
length (q.tags) - length (replace(q.tags,'/',')+1: Used to determine through a delimiter that the current data needs to be cut into segments, and the ht.help_topic_id value just acts as a circular variable for each intercept (each tag)
select q.id ,q.tags,ht.help_topic_id from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1;
5. Intercept each tag segment in turn by substring_index function and loop variable
select q.id ,q.tags,ht.help_topic_id, substring_index(q.tags,'/',ht.help_topic_id+1) tag from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1;
6. Obtain each individual label by intercepting the label segment twice
select q.id,q.tags,ht.help_topic_id,substring_index(q.tags,'/',ht.help_topic_id+1), substring_index( substring_index(q.tags,'/',ht.help_topic_id+1) ,'/',-1) tag from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1;
7. Final query statement:
select tmp.tag,count(tag) amount from ( select q.id ,q.tags,ht.help_topic_id, substring_index( substring_index(q.tags,'/',ht.help_topic_id+1) ,'/',-1) tag from quote q left join mysql.help_topic ht on ht.help_topic_id < length (q.tags) - length (replace(q.tags,'/',''))+1 ) tmp group by tag order by amount desc limit 10;
It might be a little hard to understand here. Again, if you do, skip it:
length (q.tags) - length (replace(q.tags,'/',')+1: Used to determine through a delimiter that the current data needs to be cut into segments, and the ht.help_topic_id value just acts as a circular variable for each intercept (each tag)
Looking at this diagram, you know what to expect. Also, because length (q.tags) - length (replace(q.tags,'/',')) calculates the number of separators ('/'), and the number of tags intercepted is one more than the separator, so it ends up with + 1