What Are Hypothetical Indexes in PostgreSQL?
Explore why you might need hypothetical indexes in PostgreSQL and how to use them,
Join the DZone community and get the full member experience.
Join For FreeAs the name suggests, "hypothetical indexes" are not real indexes. They are virtual indexes that PostgreSQL query planner does not consider when running queries.
So when and where are these Hypothetical Indexes useful?
First, let’s discuss a scenario in which we have a large table that is currently in the production environment. We need to make some indexes on live DB, and we are not sure whether that index will be useful or not. We don’t even know if by making that index, our production environment may be down!
The solution:
- Let's ignore the risk and make the index on the live table. Why?
- It will take lots of time depending on the data present.
- Live queries may be affected badly if we are not sure if the index we are making will increase or decrease the cost.
- We also do not know the size of the index may be too large, which can again impact the production database server.
- Replicate the production database to the local dev environment and apply all the hit. Then, apply it at the production environment. It seems like a very safe and effective approach in almost all cases, but this will often take too much time in setting up and testing.
- Hypothetical Indexes.This functionality will create imaginary indexes, not real indexes. However, there are some things to note about these indexes:
- It creates an index in our connection’s private memory. None of the catalog tables nor the original tables are actually touched.
- The only way to see if we can benefit from that index is by running an EXPLAIN <QUERY>.
- If you wish to run an EXPLAIN ANALYZE that runs that SQL and provides the run time stats, it would not be able to use that hypothetical index. This is because a hypothetical index does not exist in reality.
- Currently, these indexes will work on BTREE ONLY. However, you can try if it works on other types of indexes.
Uses of Hypothetical Indexes
Installation
I am using PostgreSQL10 on CentOS7.
Download hypopg by the following command:
Wget https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/hypopg_10-1.1.4-1.rhel7.x86_64.rpm
Then, install it on CentOS7:
yum install hypopg_10-1.1.4-1.rhel7.x86_64.rpm
Now, create extension using the following query:
testdb=# CREATE EXTENSION hypopg ;
CREATE EXTENSION
On Creating extension following functions will be created:
testdb=# select proname from pg_proc where proname ilike '%hyp%';
proname
----------------------
hypopg_reset_index
hypopg_reset
hypopg_create_index
hypopg_drop_index
hypopg
hypopg_list_indexes
hypopg_relation_size
hypopg_get_indexdef
Usage
Now let’s make a table with 10 Crores rows as follows:
testdb=# create table orders as select s as orderno, md5(random()::text) as orderitem , now() as order_created from generate_Series(1,100000000) s;
SELECT 100000000
Next, check the COST of a query by running explain:
testdb=# explain select * from orders where orderno > 80000 order by order_created desc limit 100 ;
QUERY PLAN
--------------------------------------------------------------------------------
Limit (cost=3600088.98..3600089.23 rows=100 width=44)
-> Sort (cost=3600088.98..3688095.27 rows=35202513 width=44)
Sort Key: order_created DESC
-> Seq Scan on orders (cost=0.00..2254674.25 rows=35202513 width=44)
Filter: (orderno > 80000)
(5 rows)
Now, create the Hypothetical Index:
testdb=# SELECT * FROM hypopg_create_index('create index order_created_idx on orders(order_created)');
indexrelid | indexname
------------+-----------------------------------
24797 | <24797>btree_orders_order_created
(1 row)
Next, repeat Explain to check if the above index may be useful or not:
testdb=# explain select * from orders where orderno > 80000 order by order_created desc limit 100 ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.07..4.29 rows=100 width=45)
-> Index Scan Backward using "<24797>btree_orders_order_created" on orders (cost=0.07..4215496.19 rows=99917459 width=45)
Filter: (orderno > 80000)
(3 rows)
Now we can clearly see the difference in cost and can also see that planner is using a newly created hypothetical index.
We can also Drop the index as follows:
testdb=# select * from hypopg_drop_index(24797);
hypopg_drop_index
-------------------
t
(1 row)
Then, check the estimated size of the index created virtually as follows:
testdb=# SELECT * FROM hypopg_create_index('create index order_created_idx on orders(order_created)');
indexrelid | indexname
------------+-----------------------------------
24798 | <24798>btree_orders_order_created
(1 row)
testdb=# select * from pg_size_pretty(hypopg_relation_size(24798));
pg_size_pretty
----------------
2990 MB
(1 row)
Now, let's create an actual index and see what will be the actual size:
testdb=# create index order_created_idx on orders(order_created);
CREATE INDEX
testdb=# \di+ order_created_idx
List of relations
Schema | Name | Type | Owner | Table | Size | Description
--------+-------------------+-------+----------+--------+---------+-------------
public | order_created_idx | index | postgres | orders | 2142 MB |
(1 row)
As seen, the estimated and actual size is comparable.
I hope it clears the usage of the hypothetical indexes in PostgreSQL. In one of our blogs, we learned about why the index is not working and also how to check on which tables the index needed.
In our future blogs, we will share how you will get to know the exact index you need to make in the database.
Stay tuned to hello worlds.
Published at DZone with permission of Sahil Aggarwal. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments