Partitioning Disaster Recovery With pg_partman
Partition is a great tool. However, if no maintenance is done, the situation can turn ugly. Delete data from the default partition and insert it back into relevant partitions.
Join the DZone community and get the full member experience.
Join For FreePartitions are an excellent mechanism to reduce your queryable data size. PG_partman is a tool that enables DB admins to create partitions in the Postgres database. It is packed with features and offers a lot of functionality to create and manage partitioned tables.
Ideally, when partitions are created using pg_partman, we need a monitoring mechanism to ensure that we have enough to accommodate our incoming data as time passes. This post explains how to monitor and auto-create partitions in pg_partman on an AWS infrastructure.
However, if we have no monitoring mechanism in place and run out of partitions, we are in a difficult situation. We will have to write a custom script to create new partitions and copy data from a default partition. This post guides how to get out of a problematic partition situation.
The Problem
Let’s say that we work for a marketing company that sends communications to users on behalf of its customers. We have a table customer_comms, which logs all the communications sent to the customers partitioned by month. The table is queried at runtime to check if customers have credit in their account to send the next message to the user.
The customer_comms table looks like as shown below:
Partitioned table "public.customer_comms"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+--------------------------+-----------+----------+---------------------------------------------------+----------+-------------+--------------+-------------
id | bigint | | not null | nextval('cusotmer_comms_parent_id_seq'::regclass) | plain | | |
customer_id | bigint | | not null | | plain | | |
user_id | bigint | | not null | | plain | | |
title | character varying(500) | | not null | | extended | | |
body | text[] | | not null | | extended | | |
sent_at | timestamp with time zone | | not null | now() | plain | | |
Partition key: RANGE (sent_at)
Indexes:
"customer_comms_sent_at_idx" btree (sent_at)
Partitions: customer_comms_p2022_10 FOR VALUES FROM ('2022-10-01 00:00:00+05:30') TO ('2022-11-01 00:00:00+05:30'),
customer_comms_p2022_11 FOR VALUES FROM ('2022-11-01 00:00:00+05:30') TO ('2022-12-01 00:00:00+05:30'),
customer_comms_p2022_12 FOR VALUES FROM ('2022-12-01 00:00:00+05:30') TO ('2023-01-01 00:00:00+05:30'),
customer_comms_p2023_01 FOR VALUES FROM ('2023-01-01 00:00:00+05:30') TO ('2023-02-01 00:00:00+05:30'),
customer_comms_default DEFAULT
Observe carefully that our partitions are exhausted as of the month Feb 2023. If our application tries to insert data in the table in the month of Feb 2023, it would go to the default table customer_comms_default. While the writes are preserved because of the default table, it will certainly slow down our reads from the customer_comms table.
The reads are slow because all the data that is going into the tables are now in a single partition instead of being divided over multiple partitions. If we do not create partitions for the coming months, our reads would become very slow, depending on the amount of data that we insert in our table.
So an obvious solution that comes to our mind, is to create new partitions to accommodate the future data. Now, we know that partitions can be created on an existing table using the run_maintainence() command of pg_partman. But, when we try to run this command we come across an error like the one below:
acme_marketing=# select * from partman.run_maintenance
('public.customer_comms',true,true);
ERROR: updated partition constraint for default partition
"customer_comms_default" would be violated by some row
CONTEXT: SQL statement "ALTER TABLE public.customer_comms
ATTACH PARTITION public.customer_comms_p2023_02 FOR VALUES
FROM ('2023-02-01 00:00:00+05:30') TO ('2023-03-01 00:00:00+05:30')"
The error in the above gist says that the new partitions could not be created because data already exists in the default partition. This is the difficult situation we want to get out of. Imagine that your application is writing data to the partitioned table at high volumes but all of it goes into the default partition, making your reads slower. This is one problem you definitely want to get out of.
The Solution
As we have seen, the problem is not simple, but the solution is pretty simple and straightforward. The steps outlined below will help you to get out of such a situation:
Step 1: Stop the Writes
Stop incoming wites to the application for a brief period of time. Stopping the writes will ensure that no more data is written into the default partition. We can divert your writes to a queue and later read from the queue and insert our data. Alternatively, we can take the application down, but this is the worst-case scenario.
Step 2: Move Data from the Default Partition
Now that the writes are stopped to our partitioned table, we can now safely move the data out of the default partition. We can create a temp table and move all our default partition data to it. Once data is moved out of the default partition, we can truncate it.
Here is an example of how we would do it in the case of our customer_comms table:
-- CREATE TEMP TABLE
CREATE TABLE customer_comms_temp AS
SELECT * FROM public.customer_comms_default;
-- TRUNCATE DEFAULT TABLE
TRUNCATE TABLE public.customer_comms_default;
Step 3: Run Maintenance
We can now, run maintenance on our table and it should run successfully. As there is no more data in the default partition, the error that we got earlier will not come up. Here is how you can run maintenance on the table in our example:
acme_marketing=# select partman.run_maintenance('public.customer_comms', true, true);
run_maintenance
-----------------
(1 row)
Step 4: Check your partitions
As the maintenance command above is successful, we can check our partitions and ensure that they are created properly. Here is what our customer_comms table looks like now:
acme_marketing=# \d+ customer_comms;
Partitioned table "public.customer_comms"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
-------------+--------------------------+-----------+----------+---------------------------------------------------+----------+-------------+--------------+-------------
id | bigint | | not null | nextval('cusotmer_comms_parent_id_seq'::regclass) | plain | | |
customer_id | bigint | | not null | | plain | | |
user_id | bigint | | not null | | plain | | |
title | character varying(500) | | not null | | extended | | |
body | text[] | | not null | | extended | | |
sent_at | timestamp with time zone | | not null | now() | plain | | |
Partition key: RANGE (sent_at)
Indexes:
"customer_comms_sent_at_idx" btree (sent_at)
Partitions: customer_comms_p2022_10 FOR VALUES FROM ('2022-10-01 00:00:00+05:30') TO ('2022-11-01 00:00:00+05:30'),
customer_comms_p2022_11 FOR VALUES FROM ('2022-11-01 00:00:00+05:30') TO ('2022-12-01 00:00:00+05:30'),
customer_comms_p2022_12 FOR VALUES FROM ('2022-12-01 00:00:00+05:30') TO ('2023-01-01 00:00:00+05:30'),
customer_comms_p2023_01 FOR VALUES FROM ('2023-01-01 00:00:00+05:30') TO ('2023-02-01 00:00:00+05:30'),
customer_comms_p2023_02 FOR VALUES FROM ('2023-02-01 00:00:00+05:30') TO ('2023-03-01 00:00:00+05:30'),
customer_comms_p2023_03 FOR VALUES FROM ('2023-03-01 00:00:00+05:30') TO ('2023-04-01 00:00:00+05:30'),
customer_comms_p2023_04 FOR VALUES FROM ('2023-04-01 00:00:00+05:30') TO ('2023-05-01 00:00:00+05:30'),
customer_comms_p2023_05 FOR VALUES FROM ('2023-05-01 00:00:00+05:30') TO ('2023-06-01 00:00:00+05:30'),
customer_comms_default DEFAULT
Step 5: Copy the Data Back
Now that we have recovered from the error, we can successfully copy the data back into our partitions. Remember the temp table we created in step 2 above. We need to add an index to it on the partition column so that we can query it and write select queries that can insert data into the partitions.
For our example, we create an index on the sent_at column as it is our partition column. We craete an index on the sent_at column in the temp table. Here is how do it:
acme_marketing=# CREATE INDEX sent_at_idx
ON customer_comms_temp(sent_at);
CREATE INDEX
We can now query the temp table and insert data back into our partitions. Here is how we do it four our customer_comms table:
SELECT * FROM public.customer_comms_temp
WHERE sent_at < '2023-03-01 00:00:00+05:30' AND sent_at >= '2023-02-01 00:00:00+05:30';
INSERT 0 100000
view raw
Step 6: Restart the Writes
Our application has successfully recovered from the partitioning disaster. We can now restart our writes to the table by reading from the queue or starting our application. Do ensure to set up monitoring for partitions as described in this post.
Conclusion
Partitioning with pg_partman is an excellent solution for managing our data in the long run. When we are very sure that data is going to be very huge it makes sense to partition it for the sake of faster reads. However, if we do not pay attention to its maintenance part, it can become a business-stopping problem. So, a proper maintenance process for Postgres partitions is necessary.
If we fail to do proper maintenance then we may have to stop our business application and take care of partitions as mentioned in the solution above.
Published at DZone with permission of Shreyash Thakare. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments