8 Rules For Creating Useful Database Indexes
Creating databases indexes is easy, but creating useful database indexes is quite challenging.
Join the DZone community and get the full member experience.
Join For FreeCreating an index can be done via the SQL CREATE INDEX
or programmatically via JPA 2.1 or Hibernate-specific annotations.
JPA 2.1 @Index
Starting with JPA 2.1, we can easily create indexes via the @Index
annotation as in the following example:
xxxxxxxxxx
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Index;
import javax.persistence.Table;
name = "author", (
indexes = {
(
name = "index_name",
columnList="name",
unique = true
),
(
name = "index_genre",
columnList="genre",
unique = false
)
})
public class Author{
name = "name", nullable = false) (
private String name;
name = "genre", nullable = false) (
private String genre;
}
Or, for defining multi-column indexes follow this example:
xxxxxxxxxx
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.Index;
import javax.persistence.Table;
(
name = "author",
indexes = {
(
name = "index_name_genre",
columnList="name, genre",
unique = true
)
})
public class Author {
name = "name", nullable = false) (
private String name;
name = "genre", nullable = false) (
private String genre;
}
Hibernate ORM provides a deprecated org.hibernate.annotations.Index
, therefore rely on JPA 2.1 approach.
Ideally, we create indexes for optimizing the performance of our database and SQL queries. We create super-fast data access paths for avoiding scanning the tablespace. But, that's easy to say and hard to execute properly. What is the best set of indexes for your tables? How to decide that an index is needed? How to decide if an index is useless? Well, these are hard questions and the answers are tightly coupled to what queries you execute. Nevertheless, let's highlight a developer dedicated guideline that applies in most of the cases.
1. Don't Guess the Indexes
Over the years, I saw the following bad practice for creating database indexes: watch the tables (schema), and without knowing how these tables will be accessed, try to guess what are the proper indexes. It's like trying to guess the queries that will be executed, and most of the time, the results don't have decent accuracy.
As a rule of thumb, to create the proper set of indexes try to:
- get the list of SQL queries to be used
- estimate the frequency of each SQL query
- try to score the importance of each SQL query
Having these three coordinates, find the proper set of indexes that bring the highest optimizations and the smallest trade-offs.
2. Prioritize for Indexing the Most Used SQL Queries
Mainly, this step highlights the second bullet from above, estimate the frequency of each query. The most used SQL queries should have a major priority for indexing. If the most used SQL queries are optimized, then there are big chances to assure optimal application performance.
As a rule of thumb, create indexes for the most used (heavily exploit) SQL queries and build indexes based on predicates.
3. Important SQL Queries Deserve Indexes
When we talk about query importance, we primarily consider the importance of the query for the business and, secondary, the user importance. For example, if a query is run every day for banking transactions or is run by an important user (e.g., CIO/CDIO), it might deserve its own index. But, if a query is just a simple routine or is executed by a clerk then the existing indexes should provide the proper optimizations. Of course, this is not a rule for weighing the query's importance! You have to decide per your own context.
4. Avoid Sorting Operations by Indexing Group By and Order By
Calling SQL clauses such as GROUP BY
and ORDER BY
may invoke sorting operations. These kinds of operations are typically slow (resource-intensive operations) and therefore prone to add performance penalties (e.g., as ORDER BY
does in SQL queries specific to pagination).
By indexing on the columns specified in GROUP BY
and ORDER BY
we can take advantage of optimizations that avoids sorting operations (since an index provides an ordered representation of the indexed data - keeps data preordered). Instead of applying sorting operations, the relational database may use the index. Here it is an example:
xxxxxxxxxx
SELECT * FROM book
WHERE genre = "History"
AND (publication_date, id) < (prev_publication_date, prev_id)
ORDER BY publication_date DESC, id DESC
LIMIT 50;
To optimize this query, we can create an index as follows:
CREATE INDEX book_idx ON book (publication_date, id);
Or, even better:
CREATE INDEX book_idx ON book (genre, publication_date, id);
This time, the database uses the index order and doesn't use the explicit sort operation.
5. Rely on Indexes for Uniqueness
The most database requires unique indexes for primary keys and unique constraints. These requirements are part of schema validation. Striving to write your SQL queries around these required indexes brings important benefits.
6. Rely on Indexes for Foreign Keys
As the previous step mention, a primary key constraint requires a unique index. This index is automatically created, therefore the parent table's side takes advantage of indexing. On the other hand, a foreign key is a column (or combination of columns) that appears in the child table and is used to define a relationship and ensure the integrity of the parent and child tables.
It is highly recommended to create an index on each foreign key constraint on the child table.
While the unique index for the primary key is automatically created, the unique index for the foreign key is the responsibility of the database administrator or the developers. In other words, if the database doesn't automatically create indexes for the foreign keys (e.g., SQL Server) then the indexes should be created manually by the database administrator or the developers.
Among the benefits of using indexes for foreign keys we have:
- calling the indexed foreign key on your SQL
JOIN
between the child and the parent table columns will reveal a better performance - reducing the cost of performing
UPDATE
andDELETE
that implies cascading (CASCADE
) or no action (NO ACTION
)
As a rule of thumb, after schema modifications consider testing and monitoring of the indexes to ensure that current/additional indexes don't produce a negative impact in performance terms.
7. Add Columns for Index-Only Access
Adding columns for index-only access is a technique known as index overloading. Basically, we create an index containing all the columns needed to satisfy the query. This means that the query will not require data from the tablespace, therefore less I/O operations.
For example, consider the following query:
SELECT isbn
FROM book
WHERE genre = "History";
And the following index:
CREATE INDEX book_idx ON book (genre);
The index can be used for accessing columns with a given genre
, but the database would need to access the data in the tablespace to return the isbn
. By adding the isbn
column to the index we have:
CREATE INDEX book_idx ON book (genre, isbn);
Now, all of the data needed for the above query exists in the index and no additional tablespace operations are needed.
8. Avoid Bad Standards
From coding style standards to recommended snippets of code for specific problems, companies love to use standards. Sometimes, among these standards, they sneak bad standards as well. One of the bad standards I saw says to limit the number of indexes per table to a certain value. This value varies between standards (e.g., 3, 5, 8), and this is the first sign that should raise your eyebrow that something is wrong here.
It doesn't matter how many indexes you have created per table! What matters is that every created index must increases or sustain the performance of your queries and doesn't cause significant issues in the efficiency of data modification. Data modifications (
INSERT
,UPDATE
,DELETE
) requires specific operations for maintaining the indexes as well. In a nutshell, database indexes speed the process of retrieval (SELECT
) but slow down modification (INSERT
,UPDATE
,DELETE
). So, as a rule of thumb, create as many indexes are needed to support your database queries as long as you are satisfied by the trade-off between retrieval and data modification.
If you liked this article, then you'll my book containing 150+ performance items - Spring Boot Persistence Best Practices.
This book helps every Spring Boot developer to squeeze the performances of the persistence layer.
Opinions expressed by DZone contributors are their own.
Comments