Shared Schema Strategy With Postgres (Part 1)
Apply the shared schema strategy in Postgres by using the row-level security policy.
Join the DZone community and get the full member experience.
Join For FreeThe shared schema strategy is one of the main multi-tenancy strategies besides the separate database or separate schema. Although there is no limitation, there has to be only one schema in the database, but all tenants should have the same access. The strategy assumes that all tables in a database (except tables that store vocabulary data or data available for all tenants) have a column that stores tenant identifiers. Based on this value column, we know which tenant is the owner of the table row.
Obviously, this approach offers a lower isolation level than the separate database and shared schema strategies but, just like the shared schema strategy, saves infrastructure costs. In comparison to the shared schema strategy, we can say that, although Postgres can handle multiple schemas, we can find comments from people who use this strategy with Postgres for a larger number of tenants. The maintenance for this approach, however, is harder than when the schema is shared between tenants. Executing DDL scripts for hundreds of schemas might be easily automated, but still, it might create complexity and slower deployment. It does not mean that executing DDL scripts for a database with a shared schema approach is error-free.
Implementation
The shared schema strategy assumes that all tables shared by the tenant have a column that defines its owner. This means that the value of this column has to be checked in each SQL query and operation. The column must be checked during the SELECT statement execution as well as all operations that modify data like UPDATE, INSERT and DELETE.
One approach would be that the application should handle these checks. In every SQL statement sent to the database, the application must attach a condition that checks the tenant column's value. It might be easier if the application uses some ORM framework. But still, if the application code contains any custom query, the developer has to be aware that he needs to add checks for the tenant identifier column. The pros for this approach are that if we would ever consider changing the database engine for the application, it would be quick and easy. However, there are cons for such a situation. Say, for example, we had many applications that use the same database, then ll applications should have implemented tenant column checks; not to mention if those applications are written in different programming languages.
The Implementation That Uses a Database Engine
Another approach similar to the previous one, but from a different side, is when the database engine is doing tenant column value checks. Postgres has a feature called the row security policy that allows for the implementation of such a solution. Probably the biggest advantage of this approach is that it takes a huge amount of responsibility from the application code. Of course, the application needs to implement how to create connections per specific tenants. But, despite that, the programmer does not have to worry about the tenant column's missing condition in the SQL statement unless the SQL statement is so custom that the developer changes the tenant during its execution.
Implementation Tips
No matter if we decide to choose the application approach or database approach, it needs to be mentioned that our database constraints need to be wisely designed, especially those constraints that check if the record values are unique. Designing table primary keys that would contain tenant columns is also worth considering; at least for tables that store data of different tenants. Dictionary tables used by all tenants don't need to have such requirements.
Posmulten
Posmulten is an open-source project that generates DDL statements that create a shared schema strategy for your project that uses the Postgres database. Posmulten uses a row security policy mechanism to handle tenant data isolation. Below there's a script example with DDL statements that create a simple schema. In this script, all tables have a primary key, which part is the tenant column. There is also the possibility to generate a shared schema strategy for a model where the tenant column is not a part of the primary key. Such an approach might be easier to adopt to existed applications. This approach will be described in the next part of this article.
x
CREATE TABLE public.user_info
(
id bigint NOT NULL,
tenant character varying(255) COLLATE pg_catalog."default" NOT NULL,
name character varying(32) COLLATE pg_catalog."default" NOT NULL,
manager_id bigint,
CONSTRAINT user_info_pkey PRIMARY KEY (id, tenant),
CONSTRAINT manager_fk FOREIGN KEY (manager_id, tenant)
REFERENCES public.user_info (id, tenant) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.user_info
OWNER to "postgresql-core-owner";
CREATE TABLE public.groups
(
id bigint NOT NULL,
tenant character varying(255) COLLATE pg_catalog."default" NOT NULL,
name character varying(255) COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT groups_pkey PRIMARY KEY (id, tenant)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.groups
OWNER to "postgresql-core-owner";
CREATE TABLE public.posts
(
id bigint NOT NULL,
tenant character varying(255) COLLATE pg_catalog."default" NOT NULL,
name character varying(255) COLLATE pg_catalog."default" NOT NULL,
user_id bigint,
CONSTRAINT posts_pkey PRIMARY KEY (id, tenant),
CONSTRAINT users_fk FOREIGN KEY (tenant, user_id)
REFERENCES public.user_info (tenant, id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.posts
OWNER to "postgresql-core-owner";
CREATE TABLE public.user_groups
(
user_id bigint,
group_id bigint,
tenant character varying(255) COLLATE pg_catalog."default",
CONSTRAINT group_fk FOREIGN KEY (group_id, tenant)
REFERENCES public.groups (id, tenant) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION,
CONSTRAINT user_fk FOREIGN KEY (tenant, user_id)
REFERENCES public.user_info (tenant, id) MATCH SIMPLE
ON UPDATE NO ACTION
ON DELETE NO ACTION
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;
ALTER TABLE public.user_groups
OWNER to "postgresql-core-owner";
As Posmulten is a Java library, there is a wrapper script that us it, and based on the configuration file, it creates two files with DDL statements (create_script.sql and drop_script.sql).
The drop_script.sql file contains statements that drop all changes applied by the create_script.sql file. Be in mind that those instructions are not compensations operations. They do not restore state before applying the create_script.sql script. Below there is an example of a configuration file. Please check the site for project configuration-yaml-interpreter to know how to prepare a configuration file.
xxxxxxxxxx
default_schema public
current_tenant_id_property_type"VARCHAR(255)"
default_tenant_id_column"tenant"
grantee"postgresql-core-user"
set_current_tenant_identifier_as_default_value_for_tenant_column_in_all_tablestrue
valid_tenant_value_constraint
is_tenant_valid_function_name is_t_valid
is_tenant_valid_constraint_name"is_tenant_valid_constraint_SDFA"
tenant_identifiers_blacklist
"DUMMY_TENANT"
tables
name user_info
rls_policy
name users_table_rls_policy
name posts
rls_policy
name"posts_table_rls_policy"
name groups
rls_policy
name groups_table_rls_policy
name user_groups
rls_policy
name user_groups_table_rls_policy
After executing the wrapper script for our configuration file:
x
posmulten-ddl.sh configuration.yaml
We get the create_script.sql script just like below:
xxxxxxxxxx
CREATE OR REPLACE FUNCTION public.get_current_tenant_id() RETURNS VARCHAR(255) AS $$
SELECT current_setting('posmulten.tenant_id')
$$ LANGUAGE sql
STABLE
PARALLEL SAFE;
CREATE OR REPLACE FUNCTION public.set_current_tenant_id(VARCHAR(255)) RETURNS VOID AS $$
BEGIN
PERFORM set_config('posmulten.tenant_id', $1, false);
END
$$ LANGUAGE plpgsql
VOLATILE;
CREATE OR REPLACE FUNCTION public.is_id_equals_current_tenant_id(VARCHAR(255)) RETURNS BOOLEAN AS $$
SELECT $1 = public.get_current_tenant_id()
$$ LANGUAGE sql
STABLE
PARALLEL SAFE;
CREATE OR REPLACE FUNCTION public.tenant_has_authorities(VARCHAR(255), VARCHAR(255), VARCHAR(255), VARCHAR(255), VARCHAR(255)) RETURNS BOOLEAN AS $$
SELECT public.is_id_equals_current_tenant_id($1)
$$ LANGUAGE sql
STABLE
PARALLEL SAFE;
CREATE OR REPLACE FUNCTION public.is_t_valid(VARCHAR(255)) RETURNS BOOLEAN AS $$
SELECT $1 <> CAST ('DUMMY_TENANT' AS VARCHAR(255))
$$ LANGUAGE sql
IMMUTABLE
PARALLEL SAFE;
ALTER TABLE public."user_info" ENABLE ROW LEVEL SECURITY;
ALTER TABLE public."user_groups" ENABLE ROW LEVEL SECURITY;
ALTER TABLE public."posts" ENABLE ROW LEVEL SECURITY;
ALTER TABLE public."groups" ENABLE ROW LEVEL SECURITY;
CREATE POLICY users_table_rls_policy ON public.user_info
FOR ALL
TO "postgresql-core-user"
USING (public.tenant_has_authorities(tenant, 'ALL', 'USING', 'user_info', 'public'))
WITH CHECK (public.tenant_has_authorities(tenant, 'ALL', 'WITH_CHECK', 'user_info', 'public'));
CREATE POLICY user_groups_table_rls_policy ON public.user_groups
FOR ALL
TO "postgresql-core-user"
USING (public.tenant_has_authorities(tenant, 'ALL', 'USING', 'user_groups', 'public'))
WITH CHECK (public.tenant_has_authorities(tenant, 'ALL', 'WITH_CHECK', 'user_groups', 'public'));
CREATE POLICY posts_table_rls_policy ON public.posts
FOR ALL
TO "postgresql-core-user"
USING (public.tenant_has_authorities(tenant, 'ALL', 'USING', 'posts', 'public'))
WITH CHECK (public.tenant_has_authorities(tenant, 'ALL', 'WITH_CHECK', 'posts', 'public'));
CREATE POLICY groups_table_rls_policy ON public.groups
FOR ALL
TO "postgresql-core-user"
USING (public.tenant_has_authorities(tenant, 'ALL', 'USING', 'groups', 'public'))
WITH CHECK (public.tenant_has_authorities(tenant, 'ALL', 'WITH_CHECK', 'groups', 'public'));
ALTER TABLE "public"."user_info" ADD CONSTRAINT is_tenant_valid_constraint_SDFA CHECK (tenant IS NULL OR public.is_t_valid(tenant));
ALTER TABLE "public"."user_groups" ADD CONSTRAINT is_tenant_valid_constraint_SDFA CHECK (tenant IS NULL OR public.is_t_valid(tenant));
ALTER TABLE "public"."posts" ADD CONSTRAINT is_tenant_valid_constraint_SDFA CHECK (tenant IS NULL OR public.is_t_valid(tenant));
ALTER TABLE "public"."groups" ADD CONSTRAINT is_tenant_valid_constraint_SDFA CHECK (tenant IS NULL OR public.is_t_valid(tenant));
ALTER TABLE public.user_info ALTER COLUMN tenant SET DEFAULT public.get_current_tenant_id();
ALTER TABLE public.user_groups ALTER COLUMN tenant SET DEFAULT public.get_current_tenant_id();
ALTER TABLE public.posts ALTER COLUMN tenant SET DEFAULT public.get_current_tenant_id();
ALTER TABLE public.groups ALTER COLUMN tenant SET DEFAULT public.get_current_tenant_id();
When the superuser or table owner can apply changes, the security policy is designed for the "postgresql-core-user" user, which should not be a superuser. However, the user can be the table owner. Now let's try to insert some data.
For this article's purpose, we assume that the "postgresql-core-user" user has privileges that allow modifying tables (INSERT, SELECT, UPDATE, DELETE) in the schema for which DDL statements were applied.
Insert Data
xxxxxxxxxx
SELECT set_current_tenant_id('XXX');
INSERT INTO user_info(id, tenant, name) VALUES (1, 'XXX', 'manager1');
INSERT INTO user_info(id, tenant, name, manager_id) VALUES (2, 'XXX', 'user1', 1);
INSERT INTO user_info(id, name) VALUES (3, 'user 2');
SELECT set_current_tenant_id('YYY');
INSERT INTO user_info(id, name) VALUES (1, 'manager1');
INSERT INTO user_info(id, name, manager_id) VALUES (2, 'user1', 1);
INSERT INTO user_info(id, name) VALUES (3, 'second user for YYY');
SELECT set_current_tenant_id('XXX');
INSERT INTO posts(id, tenant, name) VALUES (1, 'XXX', 'post 1');
INSERT INTO posts(id, tenant, name, user_id) VALUES (2, 'XXX', 'post 2', 1);
The above script is an example of how data should be inserted by the "postgresql-core-user" The first thing that catches our attention is calling the set_current_tenant_id(?) method. Btw, all names of methods generated by the posmulten library can be customized. The function sets a tenant identifier for the current connection session. You may notice that some INSERT instruction does not have specified the tenant column in the VALUES section. The posmulten library can generate the DDL statement that sets the default value defined for the tenant column.
Executing SQL statement without setting tenant:
Let's try to check what will happen when the tenant is not being set, and we are trying to add data.
xxxxxxxxxx
INSERT INTO user_info(id, tenant, name) VALUES (4, 'XXX', 'user4');
We can get such error:
xxxxxxxxxx
ERROR: unrecognized configuration parameter "posmulten.tenant_id"
SQL state: 42704
Executing SQL statement with setting different tenant:
Let's try to check what will happen when we set a different tenant than an INSERT statement:
xxxxxxxxxx
SELECT set_current_tenant_id('YYY');
INSERT INTO user_info(id, tenant, name) VALUES (4, 'XXX', 'user4');
We can get such error:
xxxxxxxxxx
ERROR: new row violates row-level security policy for table "user_info"
SQL state: 42501
Modify Data
Below there is an example of how to execute UPDATE instruction:
SELECT set_current_tenant_id('XXX');
UPDATE user_info SET name = 'main manager' WHERE id = 1
Output:
xxxxxxxxxx
UPDATE 1
Based on the output, we can see that one record was updated.
Executing UPDATE statement for data that belongs to the different tenant:
Below there is an example where we executed UPDATE instruction for a tenant that does not have any records. However, the rows with the specified value for id exists.
xxxxxxxxxx
SELECT set_current_tenant_id('No_such_tenant');
UPDATE user_info SET name = 'Boss' WHERE id = 1;
Output:
xxxxxxxxxx
UPDATE 0
Based on the output, we can see that no record was updated.
Executing SQL statement without setting tenant:
Let's try to check what will happen when the tenant is not being set, and we are trying to modify data.
xxxxxxxxxx
UPDATE user_info SET name = 'Boss' WHERE id = 1;
We can get such error, the same which had with the similar case for INSERT statement:
xxxxxxxxxx
ERROR: unrecognized configuration parameter "posmulten.tenant_id"
SQL state: 42704
Searching Data
Let's add some more data for the presentation:
xxxxxxxxxx
SELECT set_current_tenant_id('XXX');
INSERT INTO posts(id, name, user_id) VALUES (11, 'post 11', 1);
INSERT INTO posts(id, name, user_id) VALUES (12, 'post 12', 1);
INSERT INTO posts(id, name, user_id) VALUES (13, 'post 13', 2);
SELECT set_current_tenant_id('YYY');
INSERT INTO posts(id, name, user_id) VALUES (21, 'post 21', 1);
Selecting data for "XXX" tenant:
xxxxxxxxxx
SELECT set_current_tenant_id('XXX');
SELECT COUNT(*) FROM posts;
Output:
xxxxxxxxxx
"5"
xxxxxxxxxx
SELECT set_current_tenant_id('XXX');
SELECT * FROM posts;
Output:
Selecting data for "YYY" tenant:
xxxxxxxxxx
SELECT set_current_tenant_id('YYY');
SELECT COUNT(*) FROM posts;
Output:
xxxxxxxxxx
"1"
xxxxxxxxxx
SELECT set_current_tenant_id('YYY');
SELECT * FROM posts;
Output:
Search statement with JOIN instruction:
Selecting data for "XXX" tenant:
xxxxxxxxxx
SELECT set_current_tenant_id('XXX');
SELECT u.id, u.tenant, u.name, p.* FROM user_info u JOIN posts p ON u.id = p.user_id;
Output:
Selecting data for "YYY" tenant:
SELECT set_current_tenant_id('YYY');
SELECT u.id, u.tenant, u.name, p.* FROM user_info u JOIN posts p ON u.id = p.user_id;
Output:
Search statement with LEFT JOIN instruction:
Selecting data for "XXX" tenant:
xxxxxxxxxx
SELECT set_current_tenant_id('XXX');
SELECT u.id, u.tenant, u.name, p.* FROM user_info u LEFT JOIN posts p ON u.id = p.user_id;
Output:
Search statement with RIGHT JOIN instruction:
Selecting data for "XXX" tenant:
xxxxxxxxxx
SELECT set_current_tenant_id('XXX');
SELECT u.id, u.tenant, u.name, p.* FROM user_info u RIGHT JOIN posts p ON u.id = p.user_id;
Output:
Selecting all data:
It is always possible to select all data by superuser or table owner (unless row level policy is forced for checking a table owner) without setting the current tenant.
xxxxxxxxxx
SELECT * FROM posts;
Output:
Deleting Data
The Row Level Security is applied for the DELETE statement. Below there is an example deleting all data for one tenant (without WHERE phrase) and then displaying data for the other tenant.
xxxxxxxxxx
SELECT set_current_tenant_id('XXX');
DELETE FROM posts;
Output:
xxxxxxxxxx
DELETE 5
Selecting data for the different tenant:
xxxxxxxxxx
SELECT set_current_tenant_id('YYY');
SELECT * FROM posts;
Output:
Summary
Based on the presented examples, we can admit that it is possible to implement the shared schema strategy in the Postgres database by using the row-level security policy.
Opinions expressed by DZone contributors are their own.
Comments