What Are Hypothetical Indexes in PostgreSQL?

Explore why you might need hypothetical indexes in PostgreSQL and how to use them,

Sahil Aggarwal

Jun. 01, 21 · Tutorial

Likes (8)

Comment

Save

9.1K Views

As the name suggests, "hypothetical indexes" are not real indexes. They are virtual indexes that PostgreSQL query planner does not consider when running queries.

So when and where are these Hypothetical Indexes useful?

First, let’s discuss a scenario in which we have a large table that is currently in the production environment. We need to make some indexes on live DB, and we are not sure whether that index will be useful or not. We don’t even know if by making that index, our production environment may be down!

The solution:

Let's ignore the risk and make the index on the live table. Why?
- It will take lots of time depending on the data present.
- Live queries may be affected badly if we are not sure if the index we are making will increase or decrease the cost.
- We also do not know the size of the index may be too large, which can again impact the production database server.
Replicate the production database to the local dev environment and apply all the hit. Then, apply it at the production environment. It seems like a very safe and effective approach in almost all cases, but this will often take too much time in setting up and testing.
Hypothetical Indexes.This functionality will create imaginary indexes, not real indexes. However, there are some things to note about these indexes:
- It creates an index in our connection’s private memory. None of the catalog tables nor the original tables are actually touched.
- The only way to see if we can benefit from that index is by running an EXPLAIN <QUERY>.
- If you wish to run an EXPLAIN ANALYZE that runs that SQL and provides the run time stats, it would not be able to use that hypothetical index. This is because a hypothetical index does not exist in reality.
- Currently, these indexes will work on BTREE ONLY. However, you can try if it works on other types of indexes.

Uses of Hypothetical Indexes

Installation

I am using PostgreSQL10 on CentOS7.

Download hypopg by the following command:

    Java
   
   Wget  https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7-x86_64/hypopg_10-1.1.4-1.rhel7.x86_64.rpm

Then, install it on CentOS7:

    Java
   
   yum install hypopg_10-1.1.4-1.rhel7.x86_64.rpm

Now, create extension using the following query:

    Java
   
   testdb=# CREATE EXTENSION hypopg ;
CREATE EXTENSION

On Creating extension following functions will be created:

    Java
   
 

   testdb=#  select proname from pg_proc where proname ilike '%hyp%';
       proname        
----------------------
 hypopg_reset_index
 hypopg_reset
 hypopg_create_index
 hypopg_drop_index
 hypopg
 hypopg_list_indexes
 hypopg_relation_size
 hypopg_get_indexdef
  

Usage

Now let’s make a table with 10 Crores rows as follows:

    Java
   
   testdb=# create table orders as select s as orderno, md5(random()::text) as orderitem , now() as order_created from generate_Series(1,100000000) s;
SELECT 100000000

Next, check the COST of a query by running explain:

    Java
   
 

   testdb=# explain select * from orders where orderno > 80000  order by order_created desc  limit 100 ;
                                   QUERY PLAN                                   
--------------------------------------------------------------------------------
 Limit  (cost=3600088.98..3600089.23 rows=100 width=44)
   ->  Sort  (cost=3600088.98..3688095.27 rows=35202513 width=44)
         Sort Key: order_created DESC
         ->  Seq Scan on orders  (cost=0.00..2254674.25 rows=35202513 width=44)
               Filter: (orderno > 80000)
(5 rows)
  

Now, create the Hypothetical Index:

    Java
   
 

   testdb=# SELECT * FROM hypopg_create_index('create index order_created_idx on orders(order_created)');
 indexrelid |             indexname             
------------+-----------------------------------
      24797 | <24797>btree_orders_order_created
(1 row)
  

Next, repeat Explain to check if the above index may be useful or not:

    Java
   
 

   testdb=# explain select * from orders where orderno > 80000  order by order_created desc  limit 100 ;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.07..4.29 rows=100 width=45)
   ->  Index Scan Backward using "<24797>btree_orders_order_created" on orders  (cost=0.07..4215496.19 rows=99917459 width=45)
         Filter: (orderno > 80000)
(3 rows)
  

Now we can clearly see the difference in cost and can also see that planner is using a newly created hypothetical index.

We can also Drop the index as follows:

    Java
   
 

   testdb=# select * from hypopg_drop_index(24797);
 hypopg_drop_index 
-------------------
 t
(1 row)
  

Then, check the estimated size of the index created virtually as follows:

    Java
   
 

   testdb=# SELECT * FROM hypopg_create_index('create index order_created_idx on orders(order_created)');
 indexrelid |             indexname             
------------+-----------------------------------
      24798 | <24798>btree_orders_order_created
(1 row)

testdb=# select * from pg_size_pretty(hypopg_relation_size(24798));
 pg_size_pretty 
----------------
 2990 MB
(1 row)
  

Now, let's create an actual index and see what will be the actual size:

    Java
   
 

   testdb=# create index order_created_idx on orders(order_created);
CREATE INDEX
testdb=# \di+ order_created_idx
                               List of relations
 Schema |       Name        | Type  |  Owner   | Table  |  Size   | Description 
--------+-------------------+-------+----------+--------+---------+-------------
 public | order_created_idx | index | postgres | orders | 2142 MB | 
(1 row)
  

As seen, the estimated and actual size is comparable.

I hope it clears the usage of the hypothetical indexes in PostgreSQL. In one of our blogs, we learned about why the index is not working and also how to check on which tables the index needed.

In our future blogs, we will share how you will get to know the exact index you need to make in the database.

Stay tuned to hello worlds.

Database PostgreSQL Java (programming language)

Published at DZone with permission of Sahil Aggarwal. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending