Deriving Ideal Indexes: A Guide
Indexes are there to simplify our work when searching for data: in this blog, we tell you how you should derive ideal indexes for your database.
Join the DZone community and get the full member experience.
Join For FreeIndexes are there to simplify our work when searching for data: they speed up SELECT
queries at the expense of slowing down other kinds of queries like DELETE
s, UPDATE
s, and INSERT
s instead. However, as awesome as indexes might be, they also need a lot of work to get right — in this blog, we will tell you how you should go about deriving ideal indexes for your database. The majority of the examples in this article will focus on MySQL: however, the concept is the same for all major database management systems available on the market today.
What Are Indexes?
If you are familiar with database structures, great — because that's essentially what indexes are! Indexes are database structures that can be used to quickly find rows having specific column values. At the expense of taking up disk space and time if your tables are big and you find yourself adding indexes on top of them, indexes allow databases to skip reading through entire tables and instead, only scan relevant rows which means that databases have less data to scan through.
What Are Ideal Indexes?
With that out of the way, what is an "ideal" index? We could define a perfect index like so — such an index is any index that is capable of helping us pursue our data reading goals without putting a toll on time, without occupying much disk space in return and not slowing down INSERT
s, UPDATE
s, and DELETE
s to a point where work is incomprehensible. In other words, an ideal index is any kind of index that allows us to speed up search operations, but at the same time doesn't put a huge strain to maintain a database.
Before choosing an "ideal" index for our database instance, we must decide what index type we are going to employ. For that, it is always useful to employ a questionnaire:
- What kind of searches are we running? Do we deal with wildcards? A B-Tree index would do.
- Do we need exotic search operations (think queries involving wildcards and the like)? A
FULLTEXT
index will do. - Do we work with data that needs to be unique (think about data without duplicates)? We should head over to the
UNIQUE INDEX
space.
Once we have chosen our index type, it's type to derive an ideal index.
Deriving Ideal Indexes
To not bore you with theory, we are going to get straight to the point — how do you derive an ideal index? It must be different for every scenario, no?
In most cases, the approach outlined below would do:
- Pick an index that most suits your use case. In most cases, ordinary indexes would do. In others when we find ourselves searching for Boolean and other data, we employ a
FULLTEXT
index. For more details, refer to our indexing for high-performance post. - Ensure your queries are designed properly — ensure that your
SELECT
query only selects the fields that need to be returned instead of usingSELECT *
and employ aWHERE
clause — if it's not there, indexes are generally useless and won't be used.
The entire situation could look something like this.
a) Suppose you have five columns and you need to index three of them.
- You decide to use a B-Tree index and index multiple columns at the same time.
- Put the column that you use straight after the
WHERE
clause into the index first. - If you use
ORDER BY
, you should add the columns that you use this statement on afterwards. Do not use the columns that you've used in the previous step and do not change their order: that will make your query performance a little worse. - If we want our query to be as fast as possible, we add all other columns that were left out after completing the third step.
The steps above describe a covering index. Covering indexes are a type of index where all of the columns required for a query to execute are included in the index and it's frequently added to a table when using ALTER TABLE
queries. Here's how one ALTER TABLE
query would look in such a scenario:
ALTER TABLE demo_table ADD INDEX demo_idx(demo_c1, demo_c2, demo_c3);
demo_c1
is the column that goes straight after the WHERE
clause, demo_c2
is the column that goes straight after the ORDER BY
statement, and demo_c3
is our "remaining" column. That's it: we now have a covering index! Now we will tell you what your SELECT
query would look like. In most cases, it would look something like this:
SELECT demo_c3 FROM demo_table WHERE demo_c1 = ‘Demo’ [ORDER BY demo_c2 ASC|DESC];
demo_c1
, in this case, would be the column that the search is being run on: demo_c2
would have integer values or dates, so MySQL could sort it. You get the idea.
Designing indexes in such a way might not always be possible, however, always consider how much data you have and what kind of queries you run. If you work with bigger data sets and indexing the entire column is simply not feasible, you might also find that it might be feasible to index a prefix of the data to make your query performance a little better: such an index won't occupy a big chunk of hard drive space and it will offer you slightly better performance as well.
For many other cases, a mix of performance optimization and indexing will be required and your steps will require work with a B-Tree index. Follow these steps:
- Get an approximate number of the rows that are being held in your table that you need to index and ensure the table is running InnoDB as the primary storage engine (MyISAM is known to be unreliable as it is prone to crashes and similar things.)
- Make sure your InnoDB engine is optimized for high performance by modifying your my.cnf file (modify my.ini if you intend to run Windows.)
- Only add the index on columns that you run
SELECT
queries on. If you intend toALTER
the table, ensure to do it after you load data into it and not before as MySQL and its flavors make a copy of the table when altering it.
Follow these steps, ensure you read on our indexing and optimizing for high-performance tutorials, and you should be good to go in regards to your database performance. But if you see that your database performance is still not up to par, it might be time to glance into your database structure.
Summary
Deriving ideal indexes in any kind of database management system is not a very straightforward process, however, with enough knowledge about queries, database structure, and data sets in general we can make it work — we hope that this blog post has provided you with some information on that front.
Published at DZone with permission of Lukas Vileikis. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments