SQL Query Performance Tuning in MySQL: Part 2
In this article, the author explains how a composite index can help with better query response time from DB.
Join the DZone community and get the full member experience.
Join For FreeIn this article, we will analyze how composite index can help us with better query response time from DB.
In SQL, a composite index is an index built on multiple columns within a table, instead of just one.
If you go through the first part of this series, SQL Query Performance Tuning in MySQL - Part 1, you will have a better understanding of the setup.
Let us start troubleshooting. First, we have to insert data in bulk.
- Create DB and tables. Here employee table is without any primary key and the employee is with a primary key.
mysql> create database testdb;
mysql> show databases;
mysql> use testdb;
mysql> CREATE TABLE employee1 (id int,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE employee2 (id int primary key,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Query OK, 0 rows affected (0.02 sec
mysql> show tables;
+------------------+
| Tables_in_testdb |
+------------------+
| employee1 |
| employee2 |
+------------------+
2 rows in set (0.00 sec)
- Insert data in these tables using the following stored procedure. I have used multiple if-else just to avoid data in one sequence being inserted in tables.
CREATE DEFINER=`root`@`localhost` PROCEDURE `testdb`.`BulkInsert`()
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE j INT DEFAULT 1;
DECLARE k INT DEFAULT 25;
truncate table employee1;
truncate table employee2;
WHILE (i <= 40000) DO
INSERT INTO testdb.employee1 (id,LastName,FirstName,Address,profile) VALUES(i, CONCAT("last","-",j),CONCAT("first","-",j), CONCAT("address","-",k), CONCAT("engineer","-",i));
INSERT INTO testdb.employee2 (id,LastName,FirstName,Address,profile) VALUES(i, CONCAT("last","-",j),CONCAT("first","-",j), CONCAT("address","-",k), CONCAT("engineer","-",i));
SET i = i+1;
IF j < 100 then
SET j = j+1;
else
SET j = 1;
END IF;
IF k > 1 then
SET k = k-1;
else
SET k = 25;
END IF;
END WHILE;
END
- Call this stored procedure:
CALL testdb.BulkInsert();
- After a while, it will finish execution inserting records in tables. Check if records exist in tables.
select count(*) from employee1 e;
count(*)|
--------+
40000|
select count(*) from employee2 e;
count(*)|
--------+
40000|
- Create a composite index on three columns of
employee1
. We will not create any composite index on theemployee2
table so that we can compare performance.
mysql> CREATE INDEX compositeindex1 ON employee1 (FirstName,Address,profile);
#check indexes on employee1 table.
mysql> SHOW INDEXES FROM employee1 \G;
*************************** 1. row ***************************
Table: employee1
Non_unique: 1
Key_name: compositeindex1
Seq_in_index: 1
Column_name: FirstName
Collation: A
Cardinality: 101
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 2. row ***************************
Table: employee1
Non_unique: 1
Key_name: compositeindex1
Seq_in_index: 2
Column_name: Address
Collation: A
Cardinality: 101
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
*************************** 3. row ***************************
Table: employee1
Non_unique: 1
Key_name: compositeindex1
Seq_in_index: 3
Column_name: profile
Collation: A
Cardinality: 39920
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
3 rows in set (0.01 sec)
# check indexes on employee2 table
mysql> SHOW INDEXES FROM employee2 \G;
*************************** 1. row ***************************
Table: employee2
Non_unique: 0
Key_name: PRIMARY
Seq_in_index: 1
Column_name: id
Collation: A
Cardinality: 0
Sub_part: NULL
Packed: NULL
Null:
Index_type: BTREE
Comment:
Index_comment:
Visible: YES
Expression: NULL
1 row in set (0.00 sec)
- Now run select queries on these tables. What we find is there is some delay in query execution in the
employee2
table, which is expected as we have not created an index onemployee2
.
mysql> select * from employee1 where FirstName = "first-96" and Address ="address-5" and profile = "engineer-12196";
+-------+----------+-----------+-----------+----------------+
| id | LastName | FirstName | Address | profile |
+-------+----------+-----------+-----------+----------------+
| 12196 | last-96 | first-96 | address-5 | engineer-12196 |
+-------+----------+-----------+-----------+----------------+
1 row in set (0.00 sec)
mysql> select * from employee2 where FirstName = "first-96" and Address ="address-
+-------+----------+-----------+-----------+----------------+
| id | LastName | FirstName | Address | profile |
+-------+----------+-----------+-----------+----------------+
| 12196 | last-96 | first-96 | address-5 | engineer-12196 |
+-------+----------+-----------+-----------+----------------+
1 row in set (0.04 sec)
mysql> select * from employee1 where FirstName = "first-95" and Address ="address-
+-------+----------+-----------+-----------+----------------+
| id | LastName | FirstName | Address | profile |
+-------+----------+-----------+-----------+----------------+
| 39295 | last-95 | first-95 | address-6 | engineer-39295 |
+-------+----------+-----------+-----------+----------------+
1 row in set (0.00 sec)
mysql> select * from employee2 where FirstName = "first-95" and Address ="address-
+-------+----------+-----------+-----------+----------------+
| id | LastName | FirstName | Address | profile |
+-------+----------+-----------+-----------+----------------+
| 39295 | last-95 | first-95 | address-6 | engineer-39295 |
+-------+----------+-----------+-----------+----------------+
1 row in set (0.04 sec)
- Now let us analyze the performance of these queries in the
employee1
andemployee2
table. - For employees, data is looked up from indexes directly; hence, actual time is quite less in fetching data.
mysql> explain analyze select * from employee1 where FirstName = "first-95" and
Address ="address-6" and profile = "engineer-39295";
EXPLAIN
| -> Index lookup on employee1 using compositeindex1 (FirstName='first-95', Address='address-6', profile='engineer-39295') (cost=0.35 rows=1) (actual time=0.0858..0.0913 rows=1 loops=1)
1 row in set (0.01 sec)
Query Plan:
---------------
Index Lookup: The database is using an index named compositeindex1 to efficiently find rows in the employee1 table.
Columns Covered: The index covers three columns: FirstName, Address, and profile.
Search Criteria: The query is looking for rows where:
FirstName = 'first-95'
Address = 'address-6'
profile = 'engineer-39295'
Performance Metrics:
----------------------
Cost: The estimated cost of only 0.35 units indicates a very efficient query.
Actual Time: The query took between 0.0858 and 0.0913 seconds to execute, which is very fast.
Rows Returned: The query found and returned only 1 matching row.
Explanation:
--------------
The database leverages the composite index to directly locate the matching row without needing to scan the entire employee1 table. This is why the query is so efficient.
The low cost and execution time confirm that the query is well-optimized and the index is being used effectively.
- For employees: the index doesn't exist; complete table is scanned, which leads to delay.
mysql> explain analyze select * from employee2 where FirstName = "first-95" and
Address ="address-6" and profile = "engineer-39295";
| EXPLAIN |
| -> Filter: ((employee2.`profile` = 'engineer-39295') and (employee2.Address = 'address-6') and (employee2.FirstName = 'first-95')) (cost=4061 rows=40.1) (actual time=42.7..43.5 rows=1 loops=1)
-> Table scan on employee2 (cost=4061 rows=40051) (actual time=0.0694..30.6 rows=40000 loops=1)
1 row in set (0.05 sec)
Query Plan Breakdown:
----------------------
Filter: The query aims to retrieve rows from the employee2 table that meet all three conditions:
profile = 'engineer-39295'
Address = 'address-6'
FirstName = 'first-95'
Table Scan: The database is executing a full scan of the entire employee2 table to locate the matching rows. This means it's examining every single row in the table, even though only a small fraction might satisfy the filter conditions.
Performance Analysis:
----------------------
Cost: The estimated cost of 4061 units indicates a relatively expensive query in terms of database resources.
Actual Time: The query took 42.7 to 43.5 seconds to execute, which is a significant amount of time for a single query.
Rows Scanned: The database scanned all 40,000 rows in the employee2 table, even though it ultimately found only 1 matching row. This highlights a potential inefficiency.
Optimization Suggestions:
--------------------------
Create a Composite Index: The most impactful optimization would be to create a composite index on the three columns involved in the filter: (profile, Address, FirstName).
This index would allow the database to efficiently locate matching rows without having to scan the entire table, leading to a dramatic performance improvement.
Evaluate Maintenance Overhead:
-------------------------------
Indexes do create some additional overhead for updates and inserts, so it's essential to consider the trade-off between performance gains and maintenance costs.
Additional Considerations:
---------------------------
Index Specificity: Composite indexes are most beneficial for queries that frequently filter or join data based on the exact column combinations they cover.
Order of Columns: The order of columns in a composite index matters. The database will only use the index effectively if the query's conditions match the order of the columns in the index.
- Now let us check if this composite on three columns can help us even when we run a select query with two filters.
mysql> select count(*) from employee1 where FirstName = "first-96" and Address ="address-5";
+----------+
| count(*) |
+----------+
| 400 |
+----------+
1 row in set (0.00 sec)
mysql> explain analyze select count(*) from employee1 where FirstName = "first-96" and Address ="address-5";
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0) (cost=214 rows=1) (actual time=0.731..0.731 rows=1 loops=1)
-> Covering index lookup on employee1 using compositeindex1 (FirstName='first-96', Address='address-5') (cost=174 rows=400) (actual time=0.0337..0.697 rows=400 loops=1)
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Query Plan:
------------
Aggregate: count(0): The query aims to count the number of rows that match certain criteria.
Covering Index Lookup: The database efficiently locates matching rows using a composite index named compositeindex1 on the employee1 table.
Columns Covered: FirstName and Address
Search Criteria:
FirstName = 'first-96'
Address = 'address-5'
Performance Metrics:
--------------------
Cost: Estimated cost is 214 units (low), indicating good efficiency.
Actual Time: Took 0.731 seconds to execute (relatively fast).
Rows Returned: Found 400 matching rows and returned the count (1).
- But if we have filters on columns B and C, then the complete table is scanned, and the composite key we created wouldn't be sufficient.
mysql> explain analyze select * from employee1 where Address = "address-5" and profile = "engineer-12196";
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: ((employee1.`profile` = 'engineer-12196') and (employee1.Address = 'address-5')) (cost=4048 rows=399) (actual time=21.1..68.7 rows=1 loops=1)
-> Table scan on employee1 (cost=4048 rows=39920) (actual time=0.0445..53.9 rows=40000 loops=1)
|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.07 sec)
- However, if we have a filter on columns A and C, then the composite key that we created is utilized.
mysql> explain analyze select count(*) from employee1 where FirstName = "first-96" and profile ="engineer-12196";
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Aggregate: count(0) (cost=31.2 rows=1) (actual time=0.0679..0.068 rows=1 loops=1)
-> Filter: ((employee1.`profile` = 'engineer-12196') and (employee1.FirstName = 'first-96')) (cost=27.2 rows=40) (actual time=0.0466..0.0631 rows=1 loops=1)
-> Covering index skip scan on employee1 using compositeindex1 over FirstName = 'first-96', profile = 'engineer-12196' (cost=27.2 rows=40) (actual time=0.043..0.0594 rows=1 loops=1)
|
Summary
Thus we can summarize that if we have more than one filter in an SQL query then for a large table if composite indexes are not available on those columns that are used as filters we could see performance problems.
Also, we found that the same composite key created on three or more columns would also help with fewer filters. Let's say we created a composite key on columns A, B, and C. It will help in query performance not only when filters are on A, B, and C but also when filters are on A and B or A and C. But when filters are on columns B and C then the complete table will be scanned and this composite key will not be utilized. Key points here are:
1. Index Ordering
- Composite indexes are ordered based on the column order specified during creation. In this case, the index is ordered by A, then B, then C.
- The database can efficiently use an index to quickly locate rows when the query filters on the leftmost columns of the index. However, if the query filters on columns that are not at the beginning of the index, it might not be able to effectively use the index.
2. Missing Leading Column
- Because the query filters only on B and C, it's missing a filter on the leading column A. This means the database can't directly use the index to narrow down the search space effectively.
- It would need to potentially scan through large portions of the index to find the relevant rows, which could be less efficient than a full table scan in some cases.
To potentially improve performance in this scenario, consider:
- Creating a separate index on columns B and C: This would allow the query to directly use the index for filtering, potentially avoiding a table scan.
That's it guys! I hope you will find this article helpful and interesting. I used Bard for query analysis.
Opinions expressed by DZone contributors are their own.
Comments