SQL Query Performance Tuning in MySQL
Speed up SQL queries with indexing in MySQL. Install, analyze queries, and use stored procedures for optimal results.
Join the DZone community and get the full member experience.
Join For FreeIn this article, we are going to learn how indexing table columns can help in the quick response time of SQL queries. We'll cover the steps for installing MySQL, creating stored procedures, analyzing queries, and understanding the impact of indexing.
I have used MySQL version 8 on Ubuntu. Also, I used the Dbeavor tool as a MySQL client for connecting to the MySQL server. So let's learn together.
I have used MySQL for demonstration purposes; however, the concept remains the same in all other databases as well.
1. Following way, we can install MySQL and access it using the root user. This MySQL instance is just for testing; thus, I used a simple password.
$ sudo apt install mysql-server
$ sudo systemctl start mysql.service
$ sudo mysql
mysql> SET GLOBAL validate_password.policy = 0;
mysql> ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'password';
mysql> exit
$ mysql -uroot -ppassword
2. Create a db and use it.
mysql> create database testdb;
mysql> show databases;
mysql> use testdb;
3. Create two tables, employee1, and employee2. Here employee1 is without primary-key, and employee 2 is with primary-key.
mysql> CREATE TABLE employee1 (id int,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE employee2 (id int primary key,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.02 sec
mysql> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| employee1 |
| employee2 |
+------------------+
2 rows in set (0.00 sec)
4. Now, if we check the index for each table, we find that the employee2 table already has one index on the id column because it is a primary key.
mysql> SHOW INDEXES FROM employee1 \G;
Empty set (0.00 sec)
ERROR:
No query specified
mysql> SHOW INDEXES FROM employee2 \G;
*************************** 1. row ***************************
Table: employee2
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 0
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
1 row in set (0.00 sec)
ERROR:
No query specified
5. Now, create a stored procedure to insert bulk data in both tables. We are inserting 20000 records in each table. We can then invoke the stored procedure with CALL procedure-name command.
mysql>
CREATE PROCEDURE testdb.BulkInsert()
BEGIN
DECLARE i INT DEFAULT 1;
truncate table employee1;
truncate table employee2;
WHILE (i <= 20000) DO
INSERT INTO testdb.employee1 (id, FirstName, Address) VALUES(i, CONCAT("user","-",i), CONCAT("address","-",i));
INSERT INTO testdb.employee2 (id,FirstName, Address) VALUES(i,CONCAT("user","-",i), CONCAT("address","-",i));
SET i = i+1;
END WHILE;
END
mysql> CALL testdb.BulkInsert() ;
mysql> SELECT COUNT(*) from employee1 e ;
COUNT(*)|
--------+
20000|
mysql> SELECT COUNT(*) from employee2 e ;
COUNT(*)|
--------+
20000|
6. Now, if we select the record for any random id, we find the response from the employee1 table is slow as it doesn't have any index.
mysql> select * from employee2 where id = 15433;
+-------+----------+------------+---------------+---------+
| id | LastName | FirstName | Address | profile |
+-------+----------+------------+---------------+---------+
| 15433 | NULL | user-15433 | address-15433 | NULL |
+-------+----------+------------+---------------+---------+
1 row in set (0.00 sec)
mysql> select * from employee1 where id = 15433;
+-------+----------+------------+---------------+---------+
| id | LastName | FirstName | Address | profile |
+-------+----------+------------+---------------+---------+
| 15433 | NULL | user-15433 | address-15433 | NULL |
+-------+----------+------------+---------------+---------+
1 row in set (0.03 sec)
mysql> select * from employee1 where id = 19728;
+-------+----------+------------+---------------+---------+
| id | LastName | FirstName | Address | profile |
+-------+----------+------------+---------------+---------+
| 19728 | NULL | user-19728 | address-19728 | NULL |
+-------+----------+------------+---------------+---------+
1 row in set (0.03 sec)
mysql> select * from employee2 where id = 19728;
+-------+----------+------------+---------------+---------+
| id | LastName | FirstName | Address | profile |
+-------+----------+------------+---------------+---------+
| 19728 | NULL | user-19728 | address-19728 | NULL |
+-------+----------+------------+---------------+---------+
1 row in set (0.00 sec)
mysql> select * from employee1 where id = 3456;
+------+----------+-----------+--------------+---------+
| id | LastName | FirstName | Address | profile |
+------+----------+-----------+--------------+---------+
| 3456 | NULL | user-3456 | address-3456 | NULL |
+------+----------+-----------+--------------+---------+
1 row in set (0.04 sec)
mysql> select * from employee2 where id = 3456;
+------+----------+-----------+--------------+---------+
| id | LastName | FirstName | Address | profile |
+------+----------+-----------+--------------+---------+
| 3456 | NULL | user-3456 | address-3456 | NULL |
+------+----------+-----------+--------------+---------+
1 row in set (0.00 sec)
7. Now check the output of the command EXPLAIN ANALYZE. This command actually executes the query and plans the query, instruments it, and executes it while counting rows and measuring time spent at various points in the execution plan.
Here we find for employee1, there is a table scan performed, which means a full table is scanned or searched to fetch output. We also call it a full scan of the table.
mysql> explain analyze select * from employee1 where id = 3456;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee1.id = 3456) (cost=1989 rows=1965) (actual time=5.24..29.3 rows=1 loops=1)
-> Table scan on employee1 (cost=1989 rows=19651) (actual time=0.0504..27.3 rows=20000 loops=1)
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)
# Here is detailed explanation from ChatGPT.
Filter: (employee1.id = 3456): This indicates that there is a filter operation being performed on the "employee1" table, and only rows where the "id" column has a value of 3456 will be selected.
(cost=1989 rows=1965) (actual time=5.3..31.9 rows=1 loops=1): This part provides some performance-related information about the query execution:
cost=1989: It represents the cost estimate for the entire query execution. Cost is a relative measure of how much computational effort is required to execute the query.
rows=1965: It indicates the estimated number of rows that will be processed in this part of the query.
actual time=5.3..31.9: This shows the actual time taken for this part of the query to execute, which is measured in milliseconds.
rows=1 loops=1: The number of times this part of the query is executed in a loop.
-> Table scan on employee1 (cost=1989 rows=19651) (actual time=0.034..29.7 rows=20000 loops=1): This part shows that a table scan is being performed on the "employee1" table:
Table scan: This means that the database is scanning the entire "employee1" table to find the rows that match the filter condition.
cost=1989: The cost estimate for this table scan operation.
rows=19651: The estimated number of rows in the "employee1" table.
actual time=0.034..29.7: The actual time taken for the table scan operation, measured in milliseconds.
rows=20000 loops=1: The number of times this table scan operation is executed in a loop.
Overall, this query plan suggests that the database is executing a query that filters the "employee1" table to only return rows where the "id" column is equal to 3456.
The table scan operation reads a total of 20,000 rows to find the matching row(s) and has an estimated cost of 1989 units.
The actual execution time is 5.3 to 31.9 milliseconds, depending on the number of rows that match the filter condition.
8. For table employee2, we find that only one row is searched, and results are fetched. Thus if there are a lot of records in tables, we will observe considerable improvement in the response time of SQL queries.
mysql> explain analyze select * from employee2 where id = 3456;
+---------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------+
| -> Rows fetched before execution (cost=0..0 rows=1) (actual time=110e-6..190e-6 rows=1 loops=1)
|
+---------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
# As per ChatGPT explanation of this query plan is :
Rows fetched before execution: This part indicates that the database is fetching some data before the main query is executed.
(cost=0..0 rows=1): The cost estimate for this operation is 0 units, and it expects to fetch only one row.
(actual time=110e-6..190e-6 rows=1 loops=1): This provides the actual time taken for the data fetching operation:
actual time=110e-6..190e-6: The actual time range for the fetching operation, measured in microseconds (µs).
rows=1: The number of rows fetched.
loops=1: The number of times this data fetching operation is executed in a loop.
Overall, this part of the query plan indicates that the database is fetching a single row before executing the main query.
The actual time taken for this data fetching operation is in the range of 110 to 190 microseconds. This preliminary data fetch might be related to obtaining some essential information or parameters needed for the subsequent execution of the main query.
9. Now, let us make it more interesting. Let us analyze the query plan when we search a record for the non-index column FirstName on both tables. From the output, we find that Table Scan is done to search records which takes considerable time to fetch data.
mysql> explain analyze select * from employee2 where FirstName = 'user-13456';
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee2.FirstName = 'user-13456') (cost=2036 rows=2012) (actual time=15.7..24 rows=1 loops=1)
-> Table scan on employee2 (cost=2036 rows=20115) (actual time=0.0733..17.8 rows=20000 loops=1)
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)
mysql> explain analyze select * from employee1 where FirstName = 'user-13456';
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee1.FirstName = 'user-13456') (cost=1989 rows=1965) (actual time=23.7..35.2 rows=1 loops=1)
-> Table scan on employee1 (cost=1989 rows=19651) (actual time=0.0439..28.9 rows=20000 loops=1)
|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.03 sec)
10. Now, let us create an index on the employee1 table for column FirstName.
mysql> CREATE INDEX index1 ON employee1 (FirstName);
Query OK, 0 rows affected (0.13 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show indexes from employee1 \G;
*************************** 1. row ***************************
Table: employee1
Non_unique: 1
Key_name: index1
Seq_in_index: 1
Column_name: FirstName
Collation: A
Cardinality: 19651
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
1 row in set (0.01 sec)
ERROR:
No query specified
11. Let us now again check the query plan for both tables when we search a single record for column FirstName. We find that employee1 quickly provides a response, there is only 1 row to search, and index lookup is done on the employee1 table when using the index on column FirstName. But for employee2, the response time is large, and all 20000 rows are searched to get a response.
mysql> explain analyze select * from employee1 where FirstName = 'user-13456';
+-------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-------------------------------------------------------------------------------------------------------------------------------------+
| -> Index lookup on employee1 using index1 (FirstName='user-13456') (cost=0.35 rows=1) (actual time=0.0594..0.0669 rows=1 loops=1)
|
+-------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain analyze select * from employee2 where FirstName = 'user-13456';
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (employee2.FirstName = 'user-13456') (cost=2036 rows=2012) (actual time=15.7..23.5 rows=1 loops=1)
-> Table scan on employee2 (cost=2036 rows=20115) (actual time=0.075..17.5 rows=20000 loops=1)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)
That's it, guys. This article will help us to understand the impact of indexes on tables. How to analyze queries with explain analyze command. Also, there is learning about how to set up MySQL and how to write stored procedures for bulk insert.
Opinions expressed by DZone contributors are their own.
Comments