Mysql Index Optimization and Lock Mechanism

1. Architecture of MySQL

  1. Storage Engine
Contrast ItemMyISAMInnoDB
Primary Foreign KeyI won't support itSupport
affairI won't support itSupport
Row table lockTable locks, which lock the entire table even if one record is manipulated, are not suitable for highly concurrent operationsRow lock, which locks only one row and does not affect other rows during operation. Suitable for high concurrent operation
cacheCache index only, not real dataNot only does it cache the index, but it also caches the real data, which requires more memory, and memory size has a decisive impact on performance.
TablespaceSmalllarge
Focusperformanceaffair
Default InstallationYY
  1. mysql logical architecture is divided into four layers

    • Connection layer: Links to other languages, such as Perl, Python
    • Service layer: mysql query, internal optimization
    • Engine layer: pluggable engine, innoDB, myISAM
    • Storage Layer: Hardware
  2. View Engine

    show engines;
    show variables like '%engine%';
    

2. Index

1. Performance degradation SQL slow, long execution time, long wait time reason

  • Bad Query Statement Writing

  • Index Failure

    • Single-valued index (only one field of a table is indexed when looking up)

      select * from user where name = '';
      /*
      Index by name field of user table
      */
      create index idx_user_name on user(name);
      
    • Composite Index

      select * from user where name = '' and email = '';
      /*
      Composite index based on name field and email of user table
      */
      create index idx_user_nameEmail on user(name,email);
      
  • Too many join queries associated (design flaws or unavoidable requirements)

  • Server tuning and parameter settings (buffer, number of threads, etc.)

2. How to optimize MySql

Summary of mysql optimization:

  • Open slow query, capture slow sql
  • explain+slow sql analysis
  • show profile queries the execution details and life cycle of sql within the Mysql server
  • Tuning sql database server parameters

3. What is an index

  • Index: is to help mysql obtain data more efficiently [data structure]
  • It can be simply understood as: fast, ordered lookup of data structures
  • In addition to the data itself, the database maintains a data structure that satisfies a specific search algorithm, which points to the data in some way so that advanced search algorithms can be implemented on top of the data, which is called an index.
  • Indexes are often stored on disk as index files
  • Index often refers to an index organized by a B-tree (multiple search tree) structure
  • Clustered index, full-text index, composite index, prefix index, unique index all use B+tree index by default, collectively referred to as index.
  • In addition to the B+tree type, there is a hash index
  • Primary key index must be unique, and conversely not

4. Advantages and disadvantages of indexing

  • advantage
    • Improve the efficiency of data retrieval and reduce the cost of database IO
    • Reduce sorting costs and CPU consumption
  • Inferiority
    • Indexes are actually tables that hold primary keys and index fields and point to records in entity tables, so index columns are also space-consuming
    • Increasing the query speed reduces the speed of updating tables (indexes change when you add or delete)
    • Large amount of data, time to research and build the best index

5. Index Classification

  • Single-valued index: An index contains only a single column, and a table can have multiple single-column indexes, but it is best to have no more than five single-value indexes for a table
  • Unique index: Values of index columns must be unique, but null values are allowed
  • Composite index: An index contains multiple columns

6. Basic Index Syntax

  • Establish

    CREATE INDEX indexName ON mytable(field);
    
  • delete

    DROP INDEX [indexName] ON mytable;
    
  • See

    show index from mytable;
    

7. There are four ways to add an index to a data table

  • ALTER TABLE table_name ADD PRIMARY KEY(column_list): The statement adds a primary key, which means the index value must be unique and cannot be null;
  • ALTER TABLE table_name ADD UNIQUE index_name (column_list): The value for which the statement creates an index must be unique (null may occur multiple times except for null)
  • ALTER TABLE table_name ADD INDEX index_name (column_list): Add a normal index, so it can only appear multiple times
  • ALTER TABLE table_name ADD FULLTEXT index_name (column_list): This statement specifies that the index is FULLTEXT for full-text indexing

8. mysql index structure

  • The structure of BTee index is mainly explored

[External chain picture transfer failed, source station may have anti-theft chain mechanism, it is recommended to save the picture and upload it directly (img-tgSEfa4Q-16211685749) (C:Usersp01550Desktopimagefade06f1bdf1d770d325d135cdff.png)]

  • At the bottom are the leaf nodes, only the leaf nodes hold the data, and the other nodes hold only the pointer and keyword
  • The parent node's keyword exists in the child node, either taking the maximum or minimum value of the child node, or the minimum value of the parent node's keyword if the child nodes are in ascending order
  • For example, look up 28, process: because 28 is greater than 20 and less than 35, the p2 pointer of data page 1 points to data page 3, because 28 is less than 30, so the p1 pointer of data page 3 points to data page 7, and finally the stored data of 28 is found

9. What do you need to create an index?

  • Primary Key Automatically Creates Unique Index
  • Fields frequently used as query criteria should be indexed
  • Fields associated with other tables in a query, indexed by foreign key relationships
  • Frequently updated fields are not suitable for creating an index because each update not only updates the record, but also updates the index, which increases the IO burden
  • Field not used in where condition does not create index
  • A propensity to create composite indexes under high concurrency
  • Sorted fields in a query. Sorted fields accessed through an index will greatly improve the sorting speed
  • Statistics or Grouping Fields in Queries

10. What do you not need to create an index

  • Too few table records (under 3 million)
  • Frequently added or deleted tables
  • Table fields with duplicate and evenly distributed data should be indexed only for the most frequently queried and sorted columns. Note that if a data column contains many duplicates, indexing it will not have much practical effect.

11. Performance Analysis Explain

  • Usage: explain sql statement;
  • Effect:
    • Reading order of tables
    • Operation type of data read operation
    • Which indexes can be used
    • Which index is actually used
    • References between tables
    • How many rows per table are queried by the optimizer
  • explain's field:
    • id
      • Same id, execution order from top to bottom
      • ID is different, if it is a subquery, the id's ordinal will increase, the higher the ID value, the higher the priority, the more it will be executed first
    • select_type
      • SIMPLE: Simplest query without subqueries and union
      • PRIMARY: Main query, outermost, last executed
      • Subqueries before SUBQUERY:from
      • A subquery that occurs in the DERIVED:from clause, also known as a derived table
      • Select_of the table following the UNION:union keyword Typees are unions
      • UNION RESULT: select to get results from the union table
    • table
      • Show which table this row data is about
    • type
      • Shows what types of queries are used, from good to bad: system>const>eq_ Ref>ref>range>index>All
      • system: system table, only one row of data, special case of const, ignored
      • const: Indicates that the index is found once and is used to compare the primary and unique indexes because only one row of data is matched, such as where to look for a value for the primary key
      • eq_ref: Commonly used for primary key and uniqueness index scans where only one record in the table matches each key
      • ref: A non-unique index scan that returns all rows matching a single value (qualified multiple rows)
      • Range: Retrieves only rows of a given range, using an index to select rows. The key column shows which index is used. In general, there are queries between, <, >, in, etc. in your where statement. This range scan index scan is better than full table scan and does not scan all indexes
      • Index:. The difference with all is that index traverses only the number of indexes
      • ALL: Traverse the full table
    • possible_keys
      • Displays the indexes that may be applied to this table, one or more (but not necessarily actually used by the query)
    • key
      • Actual Index [Override Index appears only in the key list if used in the query]
      • Override index: Select field and composite index number and order are the same, for example: index with c1, c2, query for select c1, c2 from...
    • key_len
      • The number of bytes used in the index, by which the length of the index used in the query can be calculated. Without loss of precision, [shorter is better]. This value is the maximum possible length of the index field, [not the actual length used]
      • That is key_len is calculated from a table, not retrieved from it
    • ref
      • Displays which column of the index is used, if possible, as a constant. Which columns or constants are used to find values on index columns. Fields associated with other tables in the query, foreign key relationships indexed.
    • rows
      • Based on table statistics and index selection, roughly estimate the number of rows to read to find the required records, the smaller the better
      • That is, how many rows per table are queried by the optimizer
    • Extra
      • Contains additional information that is not appropriate but important to display in other columns
        • using filesort: indicates that MySQL uses an external index sort on the data rather than reading it in the index order within the table. Sorting operations in MySQL that cannot be done using indexes become File Sorting.
        • using temporary: temporary tables are used to store intermediate results, and temporary tables are used by MYsql to sort query results
          Common in order by and group by
        • using index: override index is used in the query, [inefficient]
          There is also a using where indicating that the index is used to perform lookups of index key values (columns following where are used to build the index)
          The absence of using where indicates that the columns in the index are only for select ion, not where
        • using index condition: although there are index columns in the search criteria, some of the criteria cannot use the index. The criteria for using the index are searched once and then matched if the criteria for not using the index are available
        • using where: where is used
        • using join buffer: using buffer in configuration file is enlarged when connection cache is used
        • The impossible where:where clause always has a false value and cannot be used to get any tuples
        • selec table optimized away: optimize min/max based on index without group by clause
        • Distinct: Optimize the distinct operation to stop looking for the same action after finding the first matching tuple
    • Filtered: This refers to the percentage of rows returned as a result of the rows that need to be read (the value of the rows column)

12. Establishment and optimization of indexes

Index optimization for single tables: query category_ Article_with the most views with ID of 1 and comments >1 ID

CREATE table if not exists `article`(
    `id` int(10) unsigned not null primary key auto_increment,
    `author_id` int(10) unsigned not null,
    `category_id` int(10) unsigned not null,
    `views` int(10) unsigned not null,
    `comments` int(10) unsigned not null,
    `title` varbinary(255) not null,
    `content` text not null
);

insert into `article`(`author_id`,`category_id`,`views`,`comments`,`title`,`content`) values
(1,1,1,1,'1','1'),
(2,2,2,2,'2','2'),
(1,1,3,3,'3','3');

select * from article;
  1. explain query statement, found type is all, extra prompt using filesort, need to build index to optimize SQL
explain select id, author_id from article where category_id=1 and comments>1 order by views DESC limit 1;
  1. explain after the new index, type is range, but extra s still prompt using filesort, because comments are range queries, sql cannot use the index to retrieve subsequent views, that is, the index after range type query field is invalid. This index does not qualify, delete it and repeat
create index idx_article_ccv on article(category_id, comments, views);
  1. When an index is created, it crosses the comments field of the range query required. explain finds that the type is ref, and the using filesort is gone. The optimization is successful.
create index idx_article_cv on article(category_id, views);

Two-table index optimization: two-table index optimization, left-spelled right table indexed.

create table if not exists `class` (
    `id` int(10) unsigned not null AUTO_INCREMENT,
    `card` int(10) unsigned not null,
    primary key(`id`)
);
create table if not exists `book` (
    `bookid` int(10) unsigned not null AUTO_INCREMENT,
    `card` int(10) unsigned not null,
    primary key(`bookid`)
);
insert into class(card) values(FLOOR(1 + (RAND() * 20)));
insert into book(card) values(FLOOR(1 + (RAND() * 20)));
  1. explain query statement, found type is all, need to index to optimize SQL
explain select * from class left join book on class.card=book.card;
  1. Indexing the right table, analysis found that class is all, book is ref
Alter table `book` add index Y (`card`);
  1. Indexing left table, analysis found that class is index, book is all
  2. It is concluded that the index of two tables is optimized and the right table is indexed from the left.

Three-table index optimization: three-table joins, small tables driving large tables, and indexes built on tables behind left or right

create table if not exists `phone`(
`phoneid` int(10) unsigned not null auto_increment,
`card` int(10) unsigned not null,
primary key(`phoneid`)
)engine=InnoDB;
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));

show index from book;
drop index Y on book;

alter table `phone` add index z (`card`);
alter table `book` add index Y (`card`);

explain select * from class left join book on class.card=book.card left join phone on book.card=phone.card;

13. How to avoid index failure

create table staffs(
    id int primary key auto_increment,
    NAME varchar(24) not null default '' COMMENT 'Full name',
    age int not null default 0 comment 'Age',
    pos varchar(20) not null default '' COMMENT 'position',
    add_time timestamp not null default current_timestamp comment 'Enrollment Time'
)charset utf8 comment 'Employee Records';

insert into staffs(NAME, age, pos, add_time) values('z3', 22, 'manager', NOW());
insert into staffs(NAME, age, pos, add_time) values('July', 23, 'dev', NOW());
insert into staffs(NAME, age, pos, add_time) values('2000', 23, 'dev', NOW());

alter table staffs add index idx_staffs_nameAgePos(name, age, pos);

select * from staffs where age=25 and pos='dev';
select * from staffs where pos='dev';

explain select * from staffs where age=25 and pos='dev';
explain select * from staffs where pos='dev';

After analysis, the above index was found to be invalid.

Prevent index failure:

  • Full value matches my favorite (full time matches are recommended)
  • Best left prefix rule (most important)
    • The first field cannot be lost when indexing, otherwise the index will fail.
    • For example, the index is idx_staffs_nameAgePos(name, age, pos); You must start with name, not skip name direct age or POS.
  • Not doing anything on indexed columns (calculation, function, (automatic or manual) type conversion will invalidate the index and turn to full table scanning
  • The storage engine cannot use the column to the right of the range condition in the index
  • Use override indexes whenever possible (queries that access only the index (index columns are consistent with query columns), and reduce select *
  • Failure to use an index when mysql is using an unequal (!=or <>) results in a full table scan
  • is null, is not null also cannot use an index
  • like starts with a wildcard ('%abc...') mysql index failure becomes a full table scan operation
    • Solution: like%plus right
    • Correct writing: where a = 3 and b like'kk%'and c = 4;
  • String Indexing Without Single Quotes Failed
  • Less or, index fails when used to connect

Tips:

Full-time match my favorite, the leftmost embellishment to follow
Leading brothers can't die, middle brothers can't break
Less calculation on index columns, total invalidation after range
like percentage write rightmost, override index write no star
Unequal null values or, less index failure
var quotation marks are not lost, and SQL advanced is not difficult

14. Index optimization interview questions

create table test03 (
	id int primary key not null auto_increment,
    c1 char(10),
    c2 char(10),
    c3 char(10),
    c4 char(10),
    c5 char(10)
);

insert into test03(c1,c2,c3,c4,c5) values('a1','a2','a3','a4','a5');
insert into test03(c1,c2,c3,c4,c5) values('b1','b2','b3','b4','b5');
insert into test03(c1,c2,c3,c4,c5) values('c1','c2','c3','c4','c5');
insert into test03(c1,c2,c3,c4,c5) values('d1','d2','d3','d4','d5');
insert into test03(c1,c2,c3,c4,c5) values('e1','e2','e3','e4','e5');

select * from test03;

create index idx_test03_c1234 on test03(c1,c2,c3,c4);
show index from test03;

Question: Index usage under the following sql analysis

1. explain select * from test03 where c1='a1' and c2='a2' and c3='a3' and c4='a4';
2. explain select * from test03 where c1='a1' and c2='a2' and c4='a4' and c3='a3';
3. explain select * from test03 where c4='a4' and c3='a3' and c2='a2' and c1='a1';
4. explain select * from test03 where c1='a1' and c2='a2' and c3>'a3' and c4='a4';
5. explain select * from test03 where c1='a1' and c2='a2' and c4>'a4' and c3='a3';
6. explain select * from test03 where c1='a1' and c2='a2' and c4='a4' order by c3;
7. explain select * from test03 where c1='a1' and c2='a2' order by c3;
8. explain select * from test03 where c1='a1' and c2='a2' order by c4;
9. explain select * from test03 where c1='a1' and c5='a5' order by c2,c3;
10. explain select * from test03 where c1='a1' and c5='a5' order by c3,c2;
11. explain select * from test03 where c1='a1' and c2='a2' order by c2,c3;
12. explain select * from test03 where c1='a1' and c2='a2' order by c3,c2;

First sql: normal

Section 2 sql: As with Section 1, this is normal because mysql is automatically tuned, but this is not recommended

Article 3 sql: As with the first one, this is normal because mysql is automatically tuned, but this is not recommended

Article 4 sql: Problem, type is range, ref is null, using index condition, only three indexes are used, because after the range is completely invalid.

Article 5 The sql:type is range, ref is null, using index condition. Four indexes are used because the bottom layer of mysql is automatically tuned to put C3 in front of C4.

Article 6 sql:type is ref, only C1 and C2 indexes are used. In fact, c3 is also used, but only c3 sorting function is used, so it is not included in explain lain.

Article 7 sql: Same as Article 6, but filtered is only 20%

Article 8 sql: c1, c2 indexes are used, but there are sorting errors within the file because c4 written directly by c3 is not written because it violates the best left prefix rule

Article 9 sql: only c1 field index, c2,c3 for sorting, no filesort.

Article 10 sql: filesort appears because it violates the best left prefix rule.

Article 11 sql: only c1,c2 and C3 field indexes are used for sorting, no filesort.

Article 12 sql: There is no filesort because there is a c2 field index in front, so there is no filesort in the following group by. Recommendation compared with Article 10.

  • group by causes sorting, and temporary table generation if it is confused

15. Query Interception Analysis

Principles for query optimization:

  • Small Table Drives Large Table
select * from tbl_emp;  -- Big Table
select * from tbl_dept; -- Small Table

explain select * from tbl_emp e where e.deptId in (select id from tbl_dept d);
-- When tbl_dept The dataset is smaller than tbl_emp For datasets, use in Superior to exists
explain select * from tbl_emp e where exists (select 1 from tbl_dept d where d.id=e.deptId);
-- When tbl_emp The dataset is smaller than tbl_dept For datasets, use exists Superior to in

/**
mysql Differences between exists and in:
1. in Is to hash the outer table and the inner table, query the inner table first, that is, query the table in parentheses first;
2. exists A loop is made on the outer surface, after which each item of the outer surface is taken to validate the inner table to see if each item of the outer surface exists in the inner table. That is, an EXISTS() query validates the results of the main query by putting the result set of the main query into a subquery, and decides whether the data results of the main query can be saved or not based on the validation result true or false.
It is faster to use in when the exterior is large, and faster to use exists when the interior is large.
**/
  • Order by sort optimization

    • ORDER BY satisfies two scenarios, sorting using Index:

      • ORDER BY statement uses the leftmost front column of an index
      • Use Where clause and ORDER BY clause conditional column combination to satisfy the leftmost front column of the index
    • Ways to improve order by:

      • Use order by taboo use select *, should only query required fields

        • When the sum of the size of the fields in the query is <max_ Length_ For_ Sort_ If data and the sort field is not text/BLOB, an improved algorithm, one-way sort, is used, otherwise the use of alignment sort is slow

        • Single-way and multiple-way may exceed sort_buffer capacity (more likely to be one-way), temporary will be created when exceeded, resulting in slow speed

      • Try to improve sort_buffer_size

      • Try to improve max_length_for_sort_data, but set too high to exceed sort_buffer_size

  • group by: same order by, but not with where

16. Open slow sql log

  • On: set global slow_query_log=1;

  • To see if it's turned on: show variables like'%slow_query_log%';

  • View slow query setup time: show variables like'%long_query_time%';

  • Set threshold to 1 second: set global long_query_time=1;

  • Show the slow number of sql entries in the current system: show global status like'%show_queries%';

  • If sql is executed beyond the threshold, it will be displayed in a slow query: select sleep(2);

17 Batch Data Script - Insert 50W pieces of data into the table

create database bigData;
use bigData;

-- Department table
create table dept(
	id int unsigned primary key auto_increment,
    deptno mediumint unsigned not null default 0,
    dname varchar(20) not null default "",
    loc varchar(13) not null default ""
)engine=innodb default charset=GBK;

-- Employee Sheet
create table emp(
	id int unsigned primary key auto_increment,
    empno mediumint unsigned not null default 0,/*number*/
    ename varchar(20) not null default "",/*Name*/
    job varchar(9) not null default "",/*work*/
    mgr mediumint unsigned not null default 0,/*mgr*/
    hiredate Date not null,/*Enrollment Time*/
    sal decimal(7,2) not null,/*salary*/
    comm decimal(7,2) not null,/*dividend*/
    deptno mediumint unsigned not null default 0 /*Department number*/
)engine=innodb default charset=GBK;

-- This value should be set to ON
show variables like 'log_bin_trust_function_creators';

-- Random String Generation
DELIMITER $$
create function rand_string(n int) returns varchar(255)
begin
	declare chars_str varchar(100) default 'abcdefghijklmnopqrstuvwxyzABCDEFGHIGKLMNOPQRSTUVWXYZ';
    declare return_str varchar(255) default '';
    declare i int default 0;
    while i < n do
    set return_str=concat(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
    set i = i + 1;
    end while;
    return return_Str;
end $$

-- Random Generation of Department Numbers
DELIMITER $$
create function rand_num()
returns int(5)
begin
	declare i int default 0;
    set i = floor(100+rand()*10);
	return i;
end $$

-- If you want to delete
-- drop function rand_num;

-- Create Stored Procedures
DELIMITER $$
create procedure insert_emp(in start int(10), in max_num int(10))
begin
	declare i int default 0;
    -- set autocommit = 0;
    repeat
    set i = i + 1;
    insert into emp(empno, ename, job, mgr, hiredate, sal, comm, deptno) values ((start+i), rand_string(6), 'SALESMAN', 0001, curdate(), 2000, 400, rand_num());
    until i = max_num
    end repeat;
    commit;
end $$

-- Create a stored procedure to dept Table Add Random Data
DELIMITER $$
create procedure insert_dept(in start int(10), in max_num int(10))
begin
	declare i int default 0;
    set autocommit = 0;
    repeat
    set i = i + 1;
    insert into dept(deptno, dname, loc) values ((start + i), rand_string(10),rand_string(8));
    until i = max_num
    end repeat;
    commit;
end $$
-- drop procedure insert_dept;

-- Call Stored Procedure
DELIMITER ;
CALL insert_dept(100, 10);

DELIMITER ;
CALL insert_emp(100001, 500000);

18. Show Profile

  • Show Profile is provided by mysql to analyze resource consumption for statement execution in the current session. Measurement that can be used for SQL tuning

  • By default, the parameters are turned off and the results of the last 15 runs are saved

show variables like '%profiling%';
set profiling = on;
show profiles;
show profile cpu, block io for query 17;#A full life cycle of execution within sql
/**
#Decrease speed if status has the following entries
1 coverting HEAP to MyISAM :The result of the query is too big, there is not enough memory to move to disk
2 creating tpm table : Create temporary tables, [copy data to temporary tables] [run out and delete]
3 cooying to tmp table on disk : Copy temporary tables in memory to disk. DANGER!! 
4 locked
**/

19. Mysql Lock Mechanism

  • Classification of locks
    • From the granularity of data operations:

      • Table Lock
      • Row lock
    • From the type of data operation:

      • Read Lock (Shared Lock): For the same data, multiple read operations can occur simultaneously without interacting
      • Write lock (exclusive lock): It blocks other write and read locks until the current write lock is complete
      • Summary: Read Lock Blocking Tables, Write Lock Blocking Read and Write
use test01;
show open tables from test01;
lock table tbla read, book write;
unlock tables; -- Release lock
select * from tbla;
  • Table Lock (MyISAM uses table locks by default)
    • Lock table: LOCK TABLES tbl_ Name {READ | WRITE}, [tbl_name {READ | WRITE},...]
    • Table Lock: Level operation A record also locks the entire table, not suitable for high concurrent operation
  • Row locks (innodb features)
    • Lock one line: sql statement + for update;
    • Prerequisites for row locks: use indexes, use transactions
    • Row locks change table locks after an index fails
  • Gap lock
    • With transaction A and transaction B, transaction A adds an interval lock of 10-15 to the database table, in which case insert id = 12 data is locked and cannot be executed because of interval lock 10-15, even if the 12 ID does not actually exist in the table, it will still be locked.

Posted by twoeyes on Thu, 02 Jun 2022 20:14:28 +0530