PostgreSQL BiDirectional Replication

Previously we ran Debezium in Embedded mode. In this blog, we shall focus on replication and, more specifically, bidirectional replication.

Emmanouil Gkatziouras

CORE ·

Jun. 28, 24 · Tutorial

Likes (3)

Comment

Save

10.7K Views

As you can understand from my previous blogs I am really into PostgreSQL.

Previously we ran Debezium in Embedded mode. Behind the scenes, Debezium consumes the changes that were committed to the transaction log. This happens by utilizing the logical decoding feature of PostgreSQL.

In this blog, we shall focus on replication and more specifically bidirectional replication. To achieve bidirectional replication in PostgreSQL we need the module pglogical. You might wonder about the difference between logical decoding and pglogical. Essentially, logical decoding has its origins in PgLocigal. View PgLocial as a more featureful module while logical decoding is embedded in a PostgreSQL distribution.

We will create a custom PostgreSQL Docker image and install PgLogical.

    Shell
   
 

   # Use the official PostgreSQL image as base
FROM postgres:15
USER root
RUN apt-get update; apt-get install postgresql-15-pglogical -y
USER postgres
  

Also, we need to have a PostgreSQL configuration that will enable PgLogical replication and conflict resolution.

    Shell
   
 

   listen_addresses = '*'
port = 5432
max_connections = 20
shared_buffers = 128MB
temp_buffers = 8MB
work_mem = 4MB
wal_level = logical
max_wal_senders = 3
track_commit_timestamp = on
shared_preload_libraries = 'pglogical'
pglogical.conflict_resolution = 'first_update_wins'
  

Let’s break this down. We added pglogical and we enabled track_commit_timestamp. By enabling this parameter PostgreSQL tracks the commit time of transactions. This will be crucial for the conflict resolution strategy.

Now let’s see the conflict resolution. We selected ‘first_update_wins’; therefore, in case of two transactions operating on the same row, the transaction that finished first will be the one to be considered.

Bidirectional replication is set up upon a table. Since we use Docker we shall provide an initialization script to PostgreSQL.

The script will:

Enable pglogical
Create the table
Add a target node
Insert the row we shall run tests upon

    PLSQL
   
 

   #!/bin/bash
set -e
 
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
  ALTER SYSTEM RESET shared_preload_libraries;
  CREATE EXTENSION pglogical;
 
  create schema test_schema;
  create table test_schema.employee(
          id  SERIAL PRIMARY KEY,
          firstname   TEXT    NOT NULL,
          lastname    TEXT    NOT NULL,
          email       TEXT    not null,
          age         INT     NOT NULL,
          salary         real,
          unique(email)
      );
 
  SELECT pglogical.create_node(
      node_name := '$TARGET',
      dsn := 'host=$TARGET port=5432 dbname=$POSTGRES_DB user=$POSTGRES_USER password=$POSTGRES_PASSWORD');
 
  SELECT pglogical.replication_set_add_table('default', 'test_schema.employee', true);
 
  insert into test_schema.employee (id,firstname,lastname,email,age,salary) values (1,'John','Doe 1','john1@doe.com',18,1234.23);
 
 
EOSQL
  

Let’s create the instances now using Docker Compose.

    YAML
   
 

   version: '3.1'
 
services:
  postgres-a:
    build: ./pglogicalimage
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      TARGET: postgres-b
    volumes:
      - ./config/postgresql.conf:/etc/postgresql/postgresql.conf
      - ./init:/docker-entrypoint-initdb.d
    command:
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    ports:
      - 5431:5432
  postgres-b:
    build: ./pglogicalimage
    restart: always
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      TARGET: postgres-a
    volumes:
      - ./config/postgresql.conf:/etc/postgresql/postgresql.conf
      - ./init:/docker-entrypoint-initdb.d
    command:
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    ports:
      - 5432:5432
  

We can get our instances up and running by issuing:

docker compose up

Docker Compose V2 is out there with many good features, you can find more about it in the book I authored: A Developer’s Essential Guide to Docker Compose.

Since both instances are up and running we need to enable the replication. Therefore we shall subscribe the nodes to each other.

Execute on the first node:

    PLSQL
   
 

   SELECT pglogical.create_subscription(
  subscription_name := 'postgres_b',
  provider_dsn := 'host=postgres-b port=5432 dbname=postgres user=postgres password=postgres',
  synchronize_data := false,
  forward_origins := '{}' );
  

Execute at the second node:

    PLSQL
   
 

   SELECT pglogical.create_subscription(
  subscription_name := 'postgres_a',
  provider_dsn := 'host=postgres-a port=5432 dbname=postgres user=postgres password=postgres',
  synchronize_data := false,
  forward_origins := '{}' );
  

You can use any PostgreSQL client that suits you. Alternatively, you can just use the psql client that comes packaged with the Docker Images.

For example:

docker compose exec postgres-a psql --username postgres --dbname postgres

docker compose exec postgres-b psql --username postgres --dbname postgres

Let’s see how conflict resolution will work now.

On the first node, we shall run the following snippet:

    PLSQL
   
   BEGIN;
UPDATE test_schema.employee SET lastname='first wins';
 
#before committing start transaction on postgres-b
 
COMMIT;

Don’t press commit immediately, instead take the time and before you commit the transaction start the following transaction on the second node.

    PLSQL
   
   BEGIN;
UPDATE test_schema.employee SET lastname='second looses';
 
#make sure transaction on node postgres-a is committed first.
 
COMMIT;

This transaction will be committed after the transaction that takes place in postgres-a.

Let’s check the logs on postgres-a-1:

    PLSQL
   
   postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] LOG:  CONFLICT: remote UPDATE on relation test_schema.employee (local index employee_pkey). Resolution: keep_local.
postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] DETAIL:  existing local tuple {id[int4]:1 firstname[text]:John lastname[text]:first wins email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} xid=748,origin=0,timestamp=2024-05-01 07:10:42.269227+00; remote tuple {id[int4]:1 firstname[text]:John lastname[text]:second looses email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} in xact origin=1,timestamp=2024-05-01 07:10:45.125791+00,commit_lsn=0/16181C0
postgres-a-1  | 2024-05-01 07:10:45.128 GMT [70] CONTEXT:  apply UPDATE from remote relation test_schema.employee in commit before 0/16181C0, xid 747 committed at 2024-05-01 07:10:45.125791+00 (action #2) from node replorigin 1

The transaction that took place on postgres-a finished first. Postgres-a received the replication data from the transaction of node postgres-b. A comparison was issued on the commit timestamp because the commit timestamp of the transaction on postgres-a was earlier the resolution was to keep the local changes.

We can see the reverse on postgres-b:

    PLSQL
   
   postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] LOG: CONFLICT: remote UPDATE on relation test_schema.employee (local index employee_pkey). Resolution: apply_remote.
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] DETAIL: existing local tuple {id[int4]:1 firstname[text]:John lastname[text]:second looses email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} xid=747,origin=0,timestamp=2024-05-01 07:10:45.125791+00; remote tuple {id[int4]:1 firstname[text]:John lastname[text]:first wins email[text]:john1@doe.com age[int4]:18 salary[float4]:1234.23} in xact origin=1,timestamp=2024-05-01 07:10:42.269227+00,commit_lsn=0/1618488
postgres-b-1 | 2024-05-01 07:10:45.127 GMT [81] CONTEXT: apply UPDATE from remote relation test_schema.employee in commit before 0/1618488, xid 748 committed at 2024-05-01 07:10:42.269227+00 (action #2) from node replorigin 1

Let’s check the result in the database.

    PLSQL
   
   postgres=# SELECT*FROM test_schema.employee;
 id | firstname |  lastname  |     email     | age | salary  
----+-----------+------------+---------------+-----+---------
  1 | John      | first wins | john1@doe.com |  18 | 1234.23

As expected the first transaction is the one that stayed.

To wrap it up:

We started two transactions in parallel
We changed the same row
We accepted the changes of the transaction that finished first

That’s it. Hope you had some fun and now you have another tool for your needs. In the next blog, we shall examine PostgreSQL’s driver capabilities and how we can configure an automated failover to another instance.

Replication (computing) PostgreSQL

Published at DZone with permission of Emmanouil Gkatziouras, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

PostgreSQL BiDirectional Replication

Previously we ran Debezium in Embedded mode. In this blog, we shall focus on replication and, more specifically, bidirectional replication.

Related

Partner Resources