Considering Distributed Postgres? How To Evaluate Vendor Claims of Postgres Compatibility
When evaluating a distributed Postgres database system careful consideration should be given to the level of effort required to migrate application code.
Join the DZone community and get the full member experience.
Join For FreeA number of distributed database vendors make claims about being “Postgres compatible” or “Postgres based.” This is no doubt a response to the growing and overwhelming popularity of Postgres with developers.
When evaluating a distributed Postgres database system careful consideration should be given to the level of effort required to migrate application code. Buyers should also look at the degree to which the product is outside the mainstream of the Postgres ecosystem, and the implications this raises.
Here are seven key questions and considerations:
1. Relationship With the PostgreSQL Code Base
In order to fully understand the basis of the vendor’s claims about Postgres begin by investigating the product’s exact relationship with the PostgreSQL code base.
For open-source products, you can simply examine their source code in their GitHub repository. For a proprietary product ask the vendor to describe how their product is built on Postgres (or not), and how it delivers the claimed compatibility.
Either way, the questions you need to ask here are:
- Is the product’s core distributed functionality delivered by way of a standard extension (or extensions) to core Postgres itself?
- Alternatively, is the product a fork of the standard PostgreSQL code base, and how much does this deviate from it (a “hard fork”)?
- Does the product require any patches to be applied to standard PostgreSQL? Is the source code for them available?
- Does the product incorporate PostgreSQL code by way of a straight copy of code into the product’s own code base? How recently has this been updated? i.e. what version of Postgres was utilized in this way? Is it still supported by the Postgres community?
- If the product is not making use of any Postgres code, how exactly is the claimed level of Postgres compatibility delivered?
pgEdge Distributed PostgreSQL is built 100% on standard PostgreSQL, and the core distributed functionality is implemented via a standard Postgres extension called Spock. Source code is available on the pgEdge Github page for all components of pgEdge Distributed PostgreSQL.
2. Type of Postgres Compatibility
Claims of Postgres compatibility fall into the following three categories.
Wire Protocol Compatible
PostgreSQL has a well-documented protocol for passing SQL commands and their results between the client and server. Certain distributed database products have adopted the Postgres wire protocol and this allows them to use the same language drivers as Postgres.
However, this offers little in the way of true Postgres compatibility, since the SQL syntax and semantics supported will typically be quite different from standard Postgres. Postgres applications can connect with the target database but very quickly exhibit SQL syntax errors and unexpected behaviors.
Syntax Compatible
This is the next level of Postgres compatibility. Does the product accept all Postgres SQL commands, or at least a subset, and execute them with the same semantics as standard Postgres?
Does the product support Postgres functions and stored procedures, and the same wide variety of programming languages that Postgres and various extensions do? If you make extensive use of stored procedures, you could face substantial code redevelopment.
Vendors claiming Postgres syntax compatibility have a page on their website or in their documentation describing differences and missing features between their product’s SQL implementation and that of standard Postgres. Use this to determine if a meaningful amount of code migration overhead is likely.
Fully Postgres Based
Fully Postgres-based products typically package standard Postgres along with the extensions and patches required for distributed operation. This will be fully evident in the product’s code base, assuming the source code is accessible.
These products are in almost all respects the same as stand-alone Postgres, with the same syntax, semantics, and behaviors of standard Postgres, subject to limitations imposed by the distributed architecture of the product (see below). It doesn’t make too much sense to call them “Postgres compatible” when they are fully based on Postgres.
Distributed Postgres products fully based on standard Postgres take advantage of virtually all the extensions, tooling, and add-on products available in the large and growing Postgres ecosystem. And just as importantly, application code in many cases does not require rework, although substantial testing is of course recommended.
3. Support for New PostgreSQL Versions
Determine how soon support for new Postgres major versions is incorporated into the product. This should be shortly after the annual Postgres major version release each September or October. Some distributed database products do not add support for new major versions of Postgres, locking the buyer out of the substantial innovation and improvements in each new Postgres major version release.
4. Support for PostgreSQL Extensions
A major contributing factor to the success of Postgres is its extensible architecture and the thousands of extensions developed for it. Some of these extensions allow Postgres to function as an entirely different type of database: e.g. the PostGIS extension turns Postgres into a spatial database, and pgvector turns it into a vector database.
The buyer should determine the level of support in the product – either none, limited, or full – for Postgres extensions. Do extensions just work, as with standard Postgres, or do they require the vendor to do engineering to support the extension?
5. Implications of Distributed Database Architecture on Full Postgres Compatibility
Constraints of distributed databases may place limits on the Postgres features and behaviors that can be supported by the products. “CP” databases prioritize full consistency over availability and low latency. “AP” databases prioritize availability and low latency and provide eventual consistency.
CP Databases in the market currently include CockroachDB, YugabyteDB, and Google Cloud Spanner. AP databases include pgEdge, EDB Postgres Distributed, and numerous NoSQL databases such as MongoDB, Cassandra, and DynamoDB. Constraints of CP databases place limits on the Postgres features that can be supported by the products. Again, consult the vendor documentation for details on differences with standard Postgres.
Constraints of AP databases require that the application(s) can run in an eventually consistent model. The good news is that many can, as evidenced by the popularity of NoSQL AP databases such as MongoDB and Cassandra. It is also worth noting that at each node in a geographically distributed AP cluster, standard Postgres consistency, isolation levels, and ACID compliance are seen among clients connecting to that node.
6. Advantages of the Broad Postgres Community and Ecosystem
A vibrant developer community allows Postgres to continue to evolve and deliver a steady stream of innovation. Tapping into this community-led development is one of the reasons organizations adopt Postgres.
If a distributed database product is not Postgres based and therefore not a part of the Postgres mainstream, and instead is driven by a single company (even if it is open source), organizations buying the product will no longer be able to benefit from the innovations coming from the Postgres community.
7. Determine the Level of Effort Required To Migrate to the Product
Finally, and most importantly, you need to assess how much development work is required to migrate your applications to the product. Run a pilot project that includes migration of all or part of a key application to run on at least two of the shortlisted products. Look at the level of effort required to migrate the code used in the pilot and extrapolate to an overall level of migration effort.
Using the above questions and considerations in your vendor assessment will guide you to the product that best fits your situation, depending on the importance of Distributed Postgres to you and your organization.
Published at DZone with permission of Phillip Merrick. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments