EXPLAIN FORMAT=JSON: Everything About attached_subqueries, optimized_away_subqueries, and materialized_from_subquery
EXPLAIN FORMAT=JSON provides some great details on subqueries optimization. Read on to learn more.
Join the DZone community and get the full member experience.
Join For FreeOrignially written by Sveta Smirnova
EXPLAIN FORMAT=JSON
The regular EXPLAIN command already provides some information about subquery optimization. For example, you can find out if the subquery is dependent or not, and (since version 5.6) if it was materialized:
mysql> explain select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: departments
partitions: NULL
type: index
possible_keys: PRIMARY
key: dept_name
key_len: 42
ref: NULL
rows: 9
filtered: 100.00
Extra: Using where; Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: <subquery2>
partitions: NULL
type: eq_ref
possible_keys: <auto_key>
key: <auto_key>
key_len: 4
ref: employees.departments.dept_no
rows: 1
filtered: 100.00
Extra: NULL
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: dept_manager
partitions: NULL
type: ALL
possible_keys: dept_no
key: NULL
key_len: NULL
ref: NULL
rows: 24
filtered: 90.00
Extra: Using where
3 rows in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` semi join (`employees`.`dept_manager`) where ((`<subquery2>`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`dept_manager`.`to_date` is not null))
However, you can’t find details on exactly how this subquery was materialized. To find out more, use EXPLAIN FORMAT=JSON:
mysql> explain format=json select dept_name from departments where dept_no in (select dept_no from dept_manager where to_date is not null)G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "16.72"
},
"nested_loop": [
{
"table": {
"table_name": "departments",
<skipped>
},
{
"table": {
"table_name": "<subquery2>",
"access_type": "eq_ref",
"key": "<auto_key>",
"key_length": "4",
"ref": [
"employees.departments.dept_no"
],
"rows_examined_per_scan": 1,
"materialized_from_subquery": {
"using_temporary_table": true,
"query_block": {
"table": {
"table_name": "dept_manager",
"access_type": "ALL",
"possible_keys": [
"dept_no"
],
"rows_examined_per_scan": 24,
"rows_produced_per_join": 21,
"filtered": "90.00",
"cost_info": {
"read_cost": "1.48",
"eval_cost": "4.32",
"prefix_cost": "5.80",
"data_read_per_join": "345"
},
"used_columns": [
"dept_no",
"to_date"
],
"attached_condition": "(`employees`.`dept_manager`.`to_date` is not null)"
}
}
}
}
}
]
}
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`departments`.`dept_name` AS `dept_name` from `employees`.`departments` semi join (`employees`.`dept_manager`) where ((`<subquery2>`.`dept_no` = `employees`.`departments`.`dept_no`) and (`employees`.`dept_manager`.`to_date` is not null))
We can see here that the new temporary table ("using_temporary_table": true) with internal name <subquery2> was materilized from the subquery (materialized_from_subquery). We also see how this new temporary table was accessed.
We also can see that the member that contains information about the materialized subquery is part of the nested_loop array. Nested loop optimization applies to a regular JOIN command. In this case, it means that the MySQL Server would join the temporary table with an outer query.
Another interesting case is optimized_away_subqueries: this command shows subqueries that were executed only once and were replaced by their result. A regular EXPLAIN command provides no information about such optimization:
mysql> explain select emp_no, salary from salaries order by (select max(salary) from salaries)G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: salaries
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2838525
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 2
select_type: SUBQUERY
table: salaries
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 2838525
filtered: 100.00
Extra: NULL
2 rows in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no`,`employees`.`salaries`.`salary` AS `salary` from `employees`.`salaries` order by (/* select#2 */ select max(`employees`.`salaries`.`salary`) from `employees`.`salaries`)
We can see that in select_type: SUBQUERY, the output of SHOW WARNINGS doesn’t give us much insight either.
However, EXPLAIN FORMAT=JSON tells us how this subquery was optimized:
mysql> explain format=json select emp_no, salary from salaries order by (select max(salary) from salaries)G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "573505.00"
},
"ordering_operation": {
"using_filesort": false,
"table": {
"table_name": "salaries",
<skipped>
"optimized_away_subqueries": [
{
"dependent": false,
"cacheable": true,
"query_block": {
"select_id": 2,
"cost_info": {
"query_cost": "573505.00"
},
"table": {
"table_name": "salaries",
"access_type": "ALL",
"rows_examined_per_scan": 2838525,
"rows_produced_per_join": 2838525,
"filtered": "100.00",
"cost_info": {
"read_cost": "5800.00",
"eval_cost": "567705.00",
"prefix_cost": "573505.00",
"data_read_per_join": "43M"
},
"used_columns": [
"salary"
]
}
}
}
]
}
}
}
1 row in set, 1 warning (0.00 sec)
After seeing optimized_away_subqueries and "cacheable": true, we know that the subquery was executed only once, and the result was cached and used to resolve the outer query.
Another subquery type that I want to mention in this post is attached_subqueries which covers subqueries that are not converted to JOIN.
mysql> explain format=json select emp_no from salaries where salary > (select avg(salary) from salaries)G
*************************** 1. row ***************************
EXPLAIN: {
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "250601.60"
},
"table": {
"table_name": "salaries",
"access_type": "ALL",
"rows_examined_per_scan": 1240668,
"rows_produced_per_join": 413514,
"filtered": "33.33",
"cost_info": {
"read_cost": "167898.67",
"eval_cost": "82702.92",
"prefix_cost": "250601.60",
"data_read_per_join": "6M"
},
"used_columns": [
"emp_no",
"salary"
],
"attached_condition": "(`employees`.`salaries`.`salary` > (/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))",
"attached_subqueries": [
{
"dependent": false,
"cacheable": true,
"query_block": {
"select_id": 2,
"cost_info": {
"query_cost": "250601.80"
},
"table": {
"table_name": "salaries",
"access_type": "ALL",
"rows_examined_per_scan": 1240669,
"rows_produced_per_join": 1240669,
"filtered": "100.00",
"cost_info": {
"read_cost": "2468.00",
"eval_cost": "248133.80",
"prefix_cost": "250601.80",
"data_read_per_join": "18M"
},
"used_columns": [
"salary"
]
}
}
}
]
}
}
}
1 row in set, 1 warning (0.00 sec)
Note (Code 1003): /* select#1 */ select `employees`.`salaries`.`emp_no` AS `emp_no` from `employees`.`salaries` where (`employees`.`salaries`.`salary` > (/* select#2 */ select avg(`employees`.`salaries`.`salary`) from `employees`.`salaries`))
For these subqueries, using this option shows if they are dependent and/or cacheable. There is also a query_block member inside the attached_subqueries object which provides all the details on how the query was optimized.
Conclusion: EXPLAIN FORMAT=JSON provides some great details on subqueries optimization.
Published at DZone with permission of Peter Zaitsev, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments