Hooks: The Secret Feature Powering the Postgres Ecosystem
Postgres Hooks are a powerful way to customize how Postgres works. Taking examples from popular projects we explore how to build with them.
Join the DZone community and get the full member experience.
Join For FreeWhat do developers mean when they say Postgres is "extensible"? They're referring to low-level APIs which can change the core functionality of the database. In this post, we will explore a secret - meaning undocumented - feature called hooks which allow developers to not only add features to Postgres, but modify the way that queries are executed and data is parsed.
Hooks are used by some of the most popular projects in the Postgres ecosystem, including Timescale, pg_stat_statements, and Supabase. To give you ideas of what can be built, we will see how each of these projects uses specific hooks. At the end of this post, we provide a Makefile, some C code, and compilation instructions to get started customizing Postgres on your own.
What Are Hooks?
Hooks are event-based functions present in Postgres. We can create a hook based on an event/trigger, and the database will call it. Hooks are used extensively by many libraries and projects but they all share the same basic structure.
Structure
A Postgres hook begins by including the Postgres headers and providing some standard wrappers that the database engine knows about.
include "postgres.h"
// Hooks Code
static void custom_hook()
// All hooks are called from this function
void _PG_init(void);
// Hooks logic end in this function
void _PG_fini(void);
Initialization
Hooks need to be placed in the contrib
Postgres source code path folder. You can check the source path present here for extensions already shipped with Postgres.
Once the hook code is in place, you can build it from the source or ship the attachment separately. Making the hook as an extension is beyond the scope of this blog article, but it is how all 3 projects covered here use them. Once the plugin is in place, you can then load the plugin using LOAD 'plugin_name'
to test it.
Types of Hooks
There are roughly 30 hooks across 6 categories that can be called in the C programming language. The psql-hooks project on Github is the unofficial documentation of all of the functionality.
- General Hooks - Hooks that work with general Postgres functionality.
- Security Hooks - Hooks with specific security functions such as password and user creation.
- Function Hooks - Work during function execution.
- Planner Hooks - Interception during the planner phase. For example, Timescale uses the
planner_hook
to change the query execution plan. - Executor Hooks - Interception during the execution phase. For example, pg_stat_statements uses the
ExecutorRun_hook
to inspect running queries and Supabase uses theProcessUtility_hook
to implement role based security. - PL/PG SQL Hooks - Handy when working with PL/PGSql functions.
Most documentation is better with examples, so let's review how Timescale, pg_stat_statements, and Supabase use hooks. For extra fun, read the linked C files below for sage advice from their developers on how to use these low-level APIs.
Hooks in Timescale
TimescaleDB extends Postgres "by adding hooks deep into PostgreSQL's query planner, data and storage model, and execution engine". This gives us a hint as to what parts of the database architecture they are plugging into. In particular, Timescale deals with time-series data in chunks, bits of memory where a continuous set of dates, times, events, or other time-series data is stored. A second core feature is continuous aggregates, where chunks are updated as new data comes in. Finally, Timescale has some very interesting scheduling features, again directly implemented in Postgres using hooks. Now that we know what types of hooks exist, we can examine Timescale's open-source codebase for how they are used to create the features above.
Query Planning and the Execution Engine
The most important hook for Timescale is probably the planner_hook
which deals with the query plan that Postgres produces when a user or application sends it a SQL statement. In src/planner.c
we see how they use the function timescaledb_planner
to modify the typical query plan to include chunks and ensure that HyperTables are queried correctly. Once the planner has completed Timescale does further transformations on the query using the post_parse_analyze_hook
that they initialized at the very start of Postgres operation in src/loader/loader.c
.
Likewise, in src/process_utility.c
, the ProcessUtility
hook is used to modify the execution engine of Postgres once the query plan has been generated. This file, over 4,000 lines of C, ensures that statements like ALTER TABLE get executed correctly and Timescale's unique compression features are utilized for maximum effect.
Housekeeping
There are two other interesting hooks outside of planning and execution that help Timescale ensure smooth updates and test their extension. shmem_startup_hook
fires at initialization and is used to ensure that workers have the right version of the extension during updates, that scheduling happens correctly, and other tasks. And all of this functionality couldn't go without testing, so the emit_log_hook
is used in src/bgw/log.c
to inject logs in the automated test harness.
Hooks in pg_stat_statements
Citus Data called pg_stat_statements the most useful postgres extension. It is one of the best tools for Postgres performance monitoring and can be turned on easily. pg_stat_statements makes heavy use of the ExecutorStart_hook
, the ExecutorRun_hook
, the ExecutorFinish_hook
, and the ExecutorEnd_hook
because it is interested in information about the runtime of the query. Here is an example from pg_stat_statements.c
where we track the total elapsed time of a query.
Hooks in Supabase
We could cover in detail how Supabase - the open-source Firebase implemented in Postgres - uses hooks but they have done so themselves in an excellent blog post on protecting reserved roles in Postgres. Suffice to say they use only one hook, the ProcessUtility_hook
to hijack (in a good way) a global pointer that hits right before every SQL statement is run.
Supabase uses ProcessUtility to jump into supautils.c
, a library which checks if the role that is executing the statement is allowed to do so. This helps them match the web-based authentication on supabase.com to the database role inside Postgres and provide a very smooth authentication experience. Beyond reading the pg_stat_statements C code, their blog post and supautils.c
is the best reference for implementing a moderately complex hook.
A fully-working example
Let's look at a fully-working example found in the Github repository here. Note: The license for the code below is available here.
The above piece of code is an elementary demonstration of hooks. When authentication is wrong, it introduces a sleep of 1 second before throwing an error. This code can be used as a deterrent for brute-forcing the password (this is just an example and not a security functionality/strategy that is recommended).
Other Event-Based Architectures for Postgres
There are other architectures to harvest events from Postgres. Apache Kafka using Debezium is a popular one. Postgres also has a commit log for transactions that can be watched and replicated across servers for redundancy or backups.
Conclusion
Hooks are a pretty advanced concept. A typical database user will never use it in their everyday work, but it's fun to understand how they work because you will never know when they might come in handy. In addition, some libraries already use them since they need to hook in for various metrics.
A rough strategy that you can use to see if hooks are suitable for your project involves considering the following points:
- Is it an event-based use case?
- It cannot be done on the application side and only on the database side. Is that ok?
- Can we not use CDC (Change Data Capture) tools to solve this?
- Do you have context about
C
programming language and the required knowledge about Postgres internals? - Do you have the ability to package Postgres as a custom installation? Cloud installations and services such as RDS/CloudSQL will not have this ability.
If these questions have been answered, hooks are an excellent use case.
Published at DZone with permission of Everett Berry. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments