PostgreSQL BiDirectional Replication
Previously we ran Debezium in Embedded mode. In this blog, we shall focus on replication and, more specifically, bidirectional replication.
Join the DZone community and get the full member experience.
Join For FreeAs you can understand from my previous blogs I am really into PostgreSQL.
Previously we ran Debezium in Embedded mode. Behind the scenes, Debezium consumes the changes that were committed to the transaction log. This happens by utilizing the logical decoding feature of PostgreSQL.
In this blog, we shall focus on replication and more specifically bidirectional replication. To achieve bidirectional replication in PostgreSQL we need the module pglogical. You might wonder about the difference between logical decoding and pglogical. Essentially, logical decoding has its origins in PgLocigal. View PgLocial as a more featureful module while logical decoding is embedded in a PostgreSQL distribution.
We will create a custom PostgreSQL Docker image and install PgLogical.
# Use the official PostgreSQL image as base
FROM postgres:15
USER root
RUN apt-get update; apt-get install postgresql-15-pglogical -y
USER postgres
Also, we need to have a PostgreSQL configuration that will enable PgLogical replication and conflict resolution.
listen_addresses = '*'
port = 5432
max_connections = 20
shared_buffers = 128MB
temp_buffers = 8MB
work_mem = 4MB
wal_level = logical
max_wal_senders = 3
track_commit_timestamp = on
shared_preload_libraries = 'pglogical'
pglogical.conflict_resolution = 'first_update_wins'
Let’s break this down. We added pglogical and we enabled track_commit_timestamp
. By enabling this parameter PostgreSQL tracks the commit time of transactions. This will be crucial for the conflict resolution strategy.
Now let’s see the conflict resolution. We selected ‘first_update_wins’
; therefore, in case of two transactions operating on the same row, the transaction that finished first will be the one to be considered.
Bidirectional replication is set up upon a table. Since we use Docker we shall provide an initialization script to PostgreSQL.
The script will:
- Enable pglogical
- Create the table
- Add a target node
- Insert the row we shall run tests upon
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
ALTER SYSTEM RESET shared_preload_libraries;
CREATE EXTENSION pglogical;
create schema test_schema;
create table test_schema.employee(
id SERIAL PRIMARY KEY,
firstname TEXT NOT NULL,
lastname TEXT NOT NULL,
email TEXT not null,
age INT NOT NULL,
salary real,
unique(email)
);
SELECT pglogical.create_node(
node_name := '$TARGET',
dsn := 'host=$TARGET port=5432 dbname=$POSTGRES_DB user=$POSTGRES_USER password=$POSTGRES_PASSWORD');
SELECT pglogical.replication_set_add_table('default', 'test_schema.employee', true);
insert into test_schema.employee (id,firstname,lastname,email,age,salary) values (1,'John','Doe 1','john1@doe.com',18,1234.23);
EOSQL
Let’s create the instances now using Docker Compose.
version: '3.1'
services:
postgres-a:
build: ./pglogicalimage
restart: always
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
TARGET: postgres-b
volumes:
- ./config/postgresql.conf:/etc/postgresql/postgresql.conf
- ./init:/docker-entrypoint-initdb.d
command:
- "-c"
- "config_file=/etc/postgresql/postgresql.conf"
ports:
- 5431:5432
postgres-b:
build: ./pglogicalimage
restart: always
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
TARGET: postgres-a
volumes:
- ./config/postgresql.conf:/etc/postgresql/postgresql.conf
- ./init:/docker-entrypoint-initdb.d
command:
- "-c"
- "config_file=/etc/postgresql/postgresql.conf"
ports:
- 5432:5432
We can get our instances up and running by issuing:
docker compose up
Docker Compose V2 is out there with many good features, you can find more about it in the book I authored: A Developer’s Essential Guide to Docker Compose.
Since both instances are up and running we need to enable the replication. Therefore we shall subscribe the nodes to each other.
Execute on the first node:
SELECT pglogical.create_subscription(
subscription_name := 'postgres_b',
provider_dsn := 'host=postgres-b port=5432 dbname=postgres user=postgres password=postgres',
synchronize_data := false,
forward_origins := '{}' );
Execute at the second node:
SELECT pglogical.create_subscription(
subscription_name := 'postgres_a',
provider_dsn := 'host=postgres-a port=5432 dbname=postgres user=postgres password=postgres',
synchronize_data := false,
forward_origins := '{}' );
You can use any PostgreSQL client that suits you. Alternatively, you can just use the psql client that comes packaged with the Docker Images.
For example:
Login to the first node:
docker compose exec postgres-a psql --username postgres --dbname postgres
Login to the second node:
docker compose
exec
postgres-b psql --username postgres --dbname postgres
Let’s see how conflict resolution will work now.
On the first node, we shall run the following snippet:
BEGIN;
UPDATE test_schema.employee SET lastname='first wins';
#before committing start transaction on postgres-b
COMMIT;
Don’t press commit immediately, instead take the time and before you commit the transaction start the following transaction on the second node.
BEGIN;
UPDATE test_schema.employee SET lastname='second looses';
#make sure transaction on node postgres-a is committed first.
COMMIT;
This transaction will be committed after the transaction that takes place in postgres-a.
Let’s check the logs on postgres-a-1:
postgres-a-1 | 2024-05-01 07:10:45.128 GMT [70] LOG: CONFLICT: remote UPDATE on relation test_schema.employee (local index employee_pkey). Resolution: keep_local.
postgres-a-1 | 2024-05-01 07:10:45.128 GMT [70] DETAIL: existing local tuple {id[int4]:1 firstname[text]:John lastname[text]:first wins email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} xid=748,origin=0,timestamp=2024-05-01 07:10:42.269227+00; remote tuple {id[int4]:1 firstname[text]:John lastname[text]:second looses email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} in xact origin=1,timestamp=2024-05-01 07:10:45.125791+00,commit_lsn=0/16181C0
postgres-a-1 | 2024-05-01 07:10:45.128 GMT [70] CONTEXT: apply UPDATE from remote relation test_schema.employee in commit before 0/16181C0, xid 747 committed at 2024-05-01 07:10:45.125791+00 (action #2) from node replorigin 1
The transaction that took place on postgres-a finished first. Postgres-a received the replication data from the transaction of node postgres-b. A comparison was issued on the commit timestamp because the commit timestamp of the transaction on postgres-a was earlier the resolution was to keep the local changes.
We can see the reverse on postgres-b:
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] LOG: CONFLICT: remote UPDATE on relation test_schema.employee (local index employee_pkey). Resolution: apply_remote.
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] DETAIL: existing local tuple {id[int4]:1 firstname[text]:John lastname[text]:second looses email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} xid=747,origin=0,timestamp=2024-05-01 07:10:45.125791+00; remote tuple {id[int4]:1 firstname[text]:John lastname[text]:first wins email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} in xact origin=1,timestamp=2024-05-01 07:10:42.269227+00,commit_lsn=0/1618488
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] CONTEXT: apply UPDATE from remote relation test_schema.employee in commit before 0/1618488, xid 748 committed at 2024-05-01 07:10:42.269227+00 (action #2) from node replorigin 1
Let’s check the result in the database.
postgres=# SELECT*FROM test_schema.employee;
id | firstname | lastname | email | age | salary
----+-----------+------------+---------------+-----+---------
1 | John | first wins | john1@doe.com | 18 | 1234.23
As expected the first transaction is the one that stayed.
To wrap it up:
- We started two transactions in parallel
- We changed the same row
- We accepted the changes of the transaction that finished first
That’s it. Hope you had some fun and now you have another tool for your needs. In the next blog, we shall examine PostgreSQL’s driver capabilities and how we can configure an automated failover to another instance.
Published at DZone with permission of Emmanouil Gkatziouras, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments