8 MySQL aggregate (multiline) function
Basic aggregate function
AVG and SUM functions
MIN and MAX functions
COUNT function
-
Who can use count(*), count(1), count (column name)?
There is no difference for the tables of the MyISAM engine. This engine has a counter inside to maintain the number of rows.
The InnoDB engine table uses count(*),count(1) to directly read the number of rows. The complexity is O(n), because InnoDB really needs to count them. But it is better than the specific count (column name).
-
Can you replace count(*) with count (column name)?
Do not use count (column name) instead of count(*). count(*) is the syntax of the standard statistics row number defined by SQL92. It has nothing to do with the database, NULL and non NULL
-
Note: count(*) will count the rows with NULL values, while count (column name) will not count the rows with NULL values.
-- 1. 5 Large aggregate function -- ---------------------- ---------------------- ---------------------- -- 1.1 AVG and SUM SELECT AVG(salary), SUM(salary) FROM employees; -- 1.2 MIN and MAX SELECT last_name, salary FROM employees WHERE salary IN (SELECT MIN(salary) FROM employees) OR salary IN (SELECT MAX(salary) FROM employees); -- 1.3 Count SELECT COUNT(*), COUNT(employee_id), COUNT(salary), COUNT(salary * 2), COUNT(1), COUNT(2) FROM employees WHERE salary BETWEEN 6000 AND 10000; # Sum will sum all values, while AVG ignores null # SUM AVG MIN MAX will ignore null SELECT AVG(salary), SUM(salary) / COUNT(*), AVG(commission_pct), SUM(commission_pct) / COUNT(commission_pct), SUM(commission_pct) / COUNT(*) FROM employees; # Query the average bonus rate in the company SELECT SUM(commission_pct) / COUNT(*), AVG(IFNULL(commission_pct, 0)) FROM employees; # Count (*) count (1) count (field) is required to count the number of records in the table. Efficiency comparison # If you use the MyISAM storage engine, the efficiency of the three is the same O(1) # If the InnoDB storage engine is used, the efficiency of the three: count (*) = count (1) > count (field)
GROUP BY
-
Clear: WHERE must be placed after FROM
-
All columns in the SELECT list that are not included in the group function should be included in the GROUP BY clause
-
Columns included in the GROUP BY clause do not have to be included in the SELECT list
-
After using the WITH ROLLUP keyword, add a record after all the queried grouping records. This record calculates the sum of all the queried records, that is, counts the number of records.
Note: when using ROLLUP, you cannot use the ORDER BY clause to sort the results at the same time, that is, ROLLUP and ORDER BY are mutually exclusive.
SELECT column, group_function(column) FROM table [WHERE condition] [GROUP BY group_by_expression] [ORDER BY column];
-- 2. Group By -- ---------------------- ---------------------- ---------------------- SELECT department_id, COUNT(*), SUM(salary) AS "dept_total_sal" FROM employees WHERE salary BETWEEN 6000 AND 10000 GROUP BY department_id HAVING department_id IN (50, 60, 70, 80) ORDER BY dept_total_sal DESC; SELECT department_id, AVG(salary) AS "dept_Avg_sal", SUM(salary) AS "dept_total_sal" FROM employees GROUP BY department_id; SELECT department_id, MAX(salary), MIN(salary) FROM employees GROUP BY department_id; # The two are the same # The fields of non group functions in Select must be declared in group by (otherwise, an error will be reported in original) # Fields declared in Group by may not be selected in Select SELECT department_id, job_id, AVG(salary) FROM employees GROUP BY department_id, job_id; SELECT department_id, job_id, AVG(salary) FROM employees GROUP BY job_id, department_id; # In mysql, group by uses with rollup, but does not participate in order by SELECT department_id, AVG(salary) FROM employees GROUP BY department_id WITH ROLLUP;
Having
Basic use
-
Filter grouping: HAVING clause
-
Rows have been grouped.
-
Aggregate function used.
-
Groups that meet the conditions in the HAVING clause are displayed.
-
HAVING cannot be used alone. It must be used together with GROUP BY.
-
-
Illegal use of aggregate function: aggregate function cannot be used in WHERE clause. As follows:
SELECT column, group_function(column) FROM table [WHERE condition] [GROUP BY group_by_expression] [HAVING group_condition] [ORDER BY column];
WHERE vs. HAVING
- Difference 1: WHERE can directly use the fields in the table as the filter criteria, but cannot use the calculation function in the grouping as the filter criteria: HAVING must be used in conjunction with GROUP BY, and the grouping calculation function and grouping field can be used as the filter criteria.
This determines that when grouping statistics is required for data, HAVING can complete tasks that WHERE cannot. This is because in the query syntax structure, WHERE is before GROUP BY, so the grouping results cannot be filtered. After HAVING GROUP BY, you can use the grouping field and the calculation function in the grouping to filter the grouping result set. This function cannot be completed by WHERE. In addition, the records excluded by WHERE are no longer included in the group.
- Difference 2: if you need to obtain the required data from the associated table through connection, WHERE is filtering before connection, while HAVING is filtering after connection. This determines that WHERE is more efficient than HAVING in association queries. Because WHERE can be filtered first and connected with a small filtered dataset and associated table, it takes less resources and has high execution efficiency. HAVING needs to prepare the result set first, that is, associate with the unfiltered data set, and then filter the large data set, which takes up more resources and has low execution efficiency.
- WHERE and HAVING are not mutually exclusive. We can use WHERE and HAVING in a query at the same time. The condition containing the grouping statistical function is "HAVING", and the common condition is "WHERE". In this way, we not only make use of the efficiency of WHERE conditions, but also give play to the advantage that HAVING can use query conditions containing grouping statistical functions. When the amount of data is very large, the operation efficiency will be very different.
-- 3. Having - Filter grouped data -- where Filter data in query tables, where Aggregate functions cannot be used in -- ---------------------- ---------------------- ---------------------- SELECT department_id, MAX(salary) AS "max_sal" FROM employees GROUP BY department_id HAVING max_sal BETWEEN 8000 AND 15000; # Mode 1 is more efficient than mode 2 SELECT department_id, MAX(salary) AS "max_sal" FROM employees WHERE department_id IN (10, 20, 30, 40) GROUP BY department_id HAVING max_sal > 10000; SELECT department_id, MAX(salary) AS "max_sal" FROM employees GROUP BY department_id HAVING max_sal > 10000 AND department_id IN (10, 20, 30, 40); # Conclusion: when there is an aggregate function in the filter condition, the filter condition must be declared in having # When there is no aggregate function in the filter condition, the filter condition can be declared in where or having, but it is recommended to declare where again
SELECT execution process
Query structure
#Mode 1: SELECT ...,....,... FROM ...,...,.... WHERE Connection conditions of multiple tables AND Filter conditions without group functions GROUP BY ...,... HAVING Filter conditions with group functions ORDER BY ... ASC/DESC LIMIT ...,... #Mode 2: SELECT ...,....,... FROM ... JOIN ... ON Connection conditions of multiple tables JOIN ... ON ... WHERE Filter conditions without group functions AND/OR Filter conditions without group functions GROUP BY ...,... HAVING Filter conditions with group functions ORDER BY ... ASC/DESC LIMIT ...,... #Of which: #(1) From: from which tables to filter #(2) on: Descartes product is removed when associating multi table queries #(3) where: filter criteria from tables #(4) Group by: group by #(5) having: filter again in the statistical results #(6) order by: sort #(7) limit: paging
SELECT execution sequence
-
The order of keywords cannot be reversed:
SELECT ... FROM ... WHERE ... GROUP BY ... HAVING ... ORDER BY ... LIMIT...
-
The execution order of the SELECT statement (in MySQL and Oracle, the SELECT execution order is basically the same):
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT Fields for -> DISTINCT -> ORDER BY -> LIMIT
-
For example, if you write an SQL statement, its keyword order and execution order are as follows:
SELECT DISTINCT player_id, player_name, count(*) as num # Sequence 5 FROM player JOIN team ON player.team_id = team.team_id # Sequence 1 WHERE height > 1.80 # Sequence 2 GROUP BY player.team_id # Sequence 3 HAVING num > 2 # Sequence 4 ORDER BY num DESC # Sequence 6 LIMIT 2 # Sequence 7
When the SELECT statement executes these steps, each step will generate a virtual table, and then pass this virtual table to the next step as input. It should be noted that these steps are hidden in the SQL execution process and are invisible to us.
How SQL executes
SELECT executes the FROM step first. At this stage, if you want to perform an associated query on multiple tables, you will go through the following steps:
- First, find the Cartesian product through CROSS JOIN, which is equivalent to getting the virtual table vt (virtual table) 1-1;
- Filter through ON, and filter ON the basis of virtual table vt1-1 to obtain virtual table vt1-2;
- Add an external row. If we use left link, right link or full link, external rows will be involved, that is, in the virtual
Add external rows to table vt1-2 to get virtual table vt1-3.
Of course, if we operate on more than two tables, we will repeat the above steps until all tables are processed. This process yields our raw data.
When we get the original data of the query data table, that is, the final virtual table vt1, we can proceed to the WHERE phase on this basis. In this stage, the results of the vt1 table will be filtered to get the virtual table vt2.
Then go to step 3 and step 4, that is, the GROUP and HAVING phases. In this stage, the virtual tables vt3 and vt4 are obtained by grouping and grouping filtering based on the virtual table vt2.
After we complete the condition filtering section, we can filter the fields extracted from the table, that is, enter SELECT and DISTINCT
Phase.
First, the desired fields will be extracted in the SELECT phase, and then the duplicate rows will be filtered out in the DISTINCT phase to obtain the intermediate virtual tables vt5- 1 and vt5- 2 respectively.
After we extract the desired field data, we can sort according to the specified fields, that is, the ORDER BY stage, to get the virtual table vt6.
Finally, on the basis of vt6, take out the records of the specified row, that is, the LIMIT stage, and get the final result, corresponding to the virtual table vt7.
Of course, when we write a SELECT statement, not all keywords may exist, and the corresponding stage will be omitted.
At the same time, because SQL is a structured query language similar to English, we should pay attention to the corresponding keyword order when writing the SELECT statement. The so-called underlying operation principle is the execution order we just talked about.
Exercise
-- 4. Exercise -- ---------------------- ---------------------- ---------------------- #1. Can the where clause use group functions for filtering? # NO! #2. query the maximum value, minimum value, average value and total of the company's employee wages SELECT MAX(salary), MIN(salary), AVG(salary), SUM(salary) FROM employees; #3. query each job_ The maximum, minimum, average, and sum of employee wages of ID SELECT MAX(salary), MIN(salary), AVG(salary), SUM(salary) FROM employees GROUP BY job_id; #4. select each job_ Number of employees with ID SELECT job_id, COUNT(*) FROM employees GROUP BY job_id; # 5. query the DIFFERENCE between the maximum wage and the minimum wage (DIFFERENCE) SELECT MAX(salary), MIN(salary), MAX(salary) - MIN(salary) FROM employees; # 6. query the minimum wage of employees under each manager. The minimum wage cannot be less than 6000, and employees without managers are not included SELECT emp.manager_id, MIN(emp.salary) FROM employees emp JOIN employees mgr ON emp.manager_id = mgr.employee_id GROUP BY emp.manager_id HAVING MIN(emp.salary) >= 6000; # 7. query the names and locations of all departments_ ID, number of employees and average salary, in descending order of average salary SELECT department_name, d.location_id, COUNT(*), AVG(salary) FROM employees e JOIN departments d ON e.department_id = d.department_id JOIN locations l ON d.location_id = l.location_id GROUP BY d.department_id ORDER BY avg(salary) DESC ; # 8. query the Department name, work type name and minimum wage of each type of work and each department SELECT department_name, job_id, min(salary) FROM departments d LEFT JOIN employees e ON e.department_id = d.department_id GROUP BY department_name, job_id;