MySQL experiment subquery optimization two parameter limit

MySQL experiment subquery optimization two parameter limit

Unexpectedly, there is still room for optimization of the two parameter limit. In order to see it with your own eyes, let's experiment it today.

 

Experimental preparation

Use the official MySQL database employees for experiments, See this for importing the sample library

To use the employees table, first check the table structure and the number of records in the table

mysql> desc employees;
+------------+---------------+------+-----+---------+-------+
| Field      | Type          | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no     | int(11)       | NO   | PRI | NULL    |       |
| birth_date | date          | NO   |     | NULL    |       |
| first_name | varchar(14)   | NO   |     | NULL    |       |
| last_name  | varchar(16)   | NO   |     | NULL    |       |
| gender     | enum('M','F') | NO   |     | NULL    |       |
| hire_date  | date          | NO   |     | NULL    |       |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
mysql> select count(*) from employeed;
ERROR 1146 (42S02): Table 'employees.employeed' doesn't exist
mysql> select count(*) from employees;
+----------+
| count(*) |
+----------+
|   300024 |
+----------+
1 row in set (0.05 sec)

We can see that only the primary key emp_no indexed

 

Experimental process

Explanation of Explain parameters on the official MySQL5.7 website

 

Use unoptimized two parameter limit

Run the two parameter limit and explain under normal conditions:

mysql> select * from employees limit 200000,10;
+--------+------------+------------+------------+--------+------------+
| emp_no | birth_date | first_name | last_name  | gender | hire_date  |
+--------+------------+------------+------------+--------+------------+
| 299976 | 1952-12-08 | Kristian   | Kampfer    | M      | 1994-12-28 |
| 299977 | 1956-09-30 | Zsolt      | Benveniste | M      | 1994-08-15 |
| 299978 | 1956-08-08 | Anneli     | Kitai      | F      | 1994-08-09 |
| 299979 | 1953-03-18 | Satoru     | Kornyak    | F      | 1991-06-16 |
| 299980 | 1953-05-26 | Marsal     | Lovengreen | M      | 1988-05-09 |
| 299981 | 1960-06-22 | Claudi     | Mamelak    | M      | 1986-07-13 |
| 299982 | 1955-06-21 | Juichirou  | Hiraishi   | M      | 1989-12-17 |
| 299983 | 1964-11-19 | Bezalel    | Iacovou    | M      | 1998-02-22 |
| 299984 | 1961-11-03 | Frazer     | Birch      | M      | 1986-12-31 |
| 299985 | 1961-01-04 | Miomir     | Nergos     | F      | 1996-07-07 |
+--------+------------+------------+------------+--------+------------+
10 rows in set (0.06 sec)
mysql> explain select * from employees limit 200000,10;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299468 |   100.00 | NULL  |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+-------+
1 row in set, 1 warning (0.00 sec)

We analyze explain:

  • The type is ALL, and the whole table is scanned, that is, no index is used, so key and key_len is NULL

  • There is no reference between tables, so ref is NULL. Here is a single table query

  • Partition is null, indicating that the partition table is not used / accessed

  • 299468 rows were scanned (limit does not filter null and empty values. Why is this data? Please see the official for answers)

  • Filtered here means that N% of the data left after the server layer filtering from the storage engine meets the query criteria, and 100% means that the rows are not filtered.

  • EXTRA explanation: Using filesort, when the index cannot be used for sorting

Note: the two parameter limit is to scan to the place indicated by the offset after sorting (here is row 100001), then read 10 rows and throw away the first 100000 rows.

 

Subquery optimization limit

Optimization idea: first find out the primary key value of the first row of data to be selected in the sub query by using the "overlay index" method, and then select 10 pieces of data from here according to the primary key value

mysql> select * from employees where emp_no >= (select emp_no from employees limit 200000,1) limit 10;
+--------+------------+------------+------------+--------+------------+
| emp_no | birth_date | first_name | last_name  | gender | hire_date  |
+--------+------------+------------+------------+--------+------------+
| 299976 | 1952-12-08 | Kristian   | Kampfer    | M      | 1994-12-28 |
| 299977 | 1956-09-30 | Zsolt      | Benveniste | M      | 1994-08-15 |
| 299978 | 1956-08-08 | Anneli     | Kitai      | F      | 1994-08-09 |
| 299979 | 1953-03-18 | Satoru     | Kornyak    | F      | 1991-06-16 |
| 299980 | 1953-05-26 | Marsal     | Lovengreen | M      | 1988-05-09 |
| 299981 | 1960-06-22 | Claudi     | Mamelak    | M      | 1986-07-13 |
| 299982 | 1955-06-21 | Juichirou  | Hiraishi   | M      | 1989-12-17 |
| 299983 | 1964-11-19 | Bezalel    | Iacovou    | M      | 1998-02-22 |
| 299984 | 1961-11-03 | Frazer     | Birch      | M      | 1986-12-31 |
| 299985 | 1961-01-04 | Miomir     | Nergos     | F      | 1996-07-07 |
+--------+------------+------------+------------+--------+------------+
10 rows in set (0.03 sec)

As you can see, the query speed has doubled

mysql> explain select * from employees where emp_no >= (select emp_no from employees limit 200000,1) limit 10;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table     | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | PRIMARY     | employees | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL | 149734 |   100.00 | Using where |
|  2 | SUBQUERY    | employees | NULL       | index | NULL          | PRIMARY | 4       | NULL | 299468 |   100.00 | Using index |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
2 rows in set, 1 warning (0.03 sec)

explain:

  • The second line, subquery, refers to the subquery. Let's start with the subquery

    The type is index, which indicates that we use the index tree to speed up the query

    The key is the primary key, which means that we use the primary key index. The sub query is directly queried on the primary key index tree, avoiding back to the table and reducing disk I/O

  • The first row is an external query

    If the type is range, it means that it is a range query, and then the primary key index tree is also used

    Using index means that only the information in the index tree is used to retrieve column information from the table, and the actual rows can be read without performing other searches.

 

Summary

For the direct two parameter limit that does not require order by, we can optimize the query efficiency by covering the index.

Optimize order by+limit see here

 

Tags: MySQL

Posted by indigo2k on Tue, 31 May 2022 04:19:45 +0530