PostgreSQL Rewrite Rules
Views are not what you think they are. In this article, we're going to break down PostgreSQL rewrite rules, how they work, and what they can and cannot do.
Join the DZone community and get the full member experience.
Join For FreeSince version 10, PostgreSQL supports the concept of rewrite rules, which allow you to change how queries are executed.
In fact, rewrite rules are how views are implemented in PostgreSQL. When you access a view, your query is actually rewritten according to the rule that was created when you created the view. A view is basically a rewrite rule, as we're about to see.
Don't get too excited, though — there is a lot of fine print. These rewrite rules are very different from MySQL's rewrite plugins, for instance, and quite a bit more complex.
In this article, we're going to look at rewrite rules, how they work, and what they can and cannot do.
The Basics
When you define a rewrite rule, you're telling PostgreSQL how to process a certain type of query against a certain table or view.
Rewrite rules can be defined for SELECT, UPDATE, INSERT, and DELETE. They do not exist for any other requests, such as DDL, SET, FETCH, etc., for reasons that will quickly become obvious.
SELECT Rules
Rewrite rules for SELECT are very restricted; they can essentially only define a view. In fact, rewrite rules for SELECT are rarely created explicitly; you might as well define a view instead, since a CREATE VIEW statement will have the same result, namely, a new row in the pg_catalog.pg_rewrite
table.
There is a very good reason for these restrictions: PostgreSQL includes rules in its interpretation of queries. In the case of SELECT rules, it will replace the name of the relation with what is specified in the rule. This means that a SELECT rule must be a simple SELECT, and therefore the rule cannot do anything that a regular view cannot do.
Let's have an example. Assume a simple table:
CREATE TABLE Persons (
id INT PRIMARY KEY,
first_name VARCHAR(200) NOT NULL,
last_name VARCHAR(200) NOT NULL,
dob DATE
);
INSERT INTO Persons VALUES
(1, 'Amy', 'Adams', '1974/08/20'),
(2, 'Brigitte', 'Bardot', '1934/09/28');
Now let's say we want to get the first and last names concatenated. We could create a view:
CREATE VIEW Persons_v AS
SELECT id, CONCAT(first_name, ' ', last_name) AS name, dob
FROM Persons
At this point we have:
That is of course the normal way of creating a view. But we could do exactly the same thing by creating a table and a SELECT rule. The table is going to be turned into a view by the addition of the rule:
CREATE TABLE Persons_t (
id INT,
name TEXT,
dob DATE
);
CREATE RULE "_RETURN" AS
ON SELECT TO Persons_t DO INSTEAD
SELECT id, CONCAT(first_name, ' ', last_name) AS name, dob
FROM Persons;
A few things to notice:
- The
Persons_t
table does not define a primary key. If it did, we could not create a SELECT rule on it. - The
name
column inPersons_t
is of type TEXT because that's the type returned by the CONCAT function. - The rule is named _RETURN because all SELECT rules must have that name.
With this "table" Persons_t
and the rewrite rule in place, we now have a view, even though we never explicitly invoked CREATE VIEW. It will show up in pg_catalog.pg_views
, for instance.
You will almost certainly never do this; there is practically no reason to do it this way, other than intellectual curiosity. But other kinds of rewrite rules are more interesting.
UPDATE, INSERT, and DELETE Rules
So far we've learned that SELECT rules are in fact limited to exactly what views can do because that's what a view is.
For all the other rewrite rules, however, we get a lot more freedom because they are meant to specify how updates should be handled, often (but not exclusively) in views.
Let's say we want to insert into our view Persons_v
. If we try the obvious:
INSERT INTO Persons_v (id, name, dob)
VALUES (3, 'Charlie Chaplin', '1889/04/16')
Then we're going to get an error:
[Code: 0, SQL State: 0A000] ERROR: cannot insert into column "name" of view "persons_v"
Detail: View columns that are not columns of their base relation are not updatable.
That makes sense. There is no way for PostgreSQL to know how to handle the name
column, which is derived using a formula.
That's where an INSERT rule comes in:
CREATE RULE PersonsInsert AS
ON INSERT TO Persons_v DO INSTEAD
INSERT INTO Persons (id, first_name, last_name, dob)
VALUES (NEW.id, SPLIT_PART(NEW.name, ' ', 1), SPLIT_PART(NEW.name, ' ', 2), NEW.dob)
We can now insert into the Persons_v
view, and the values for the first_name
and last_name
columns will be properly handled:
INSERT INTO Persons_v (id, name, dob)
VALUES (3, 'Charlie Chaplin', '1889/04/16')
We now have:
Getting Fancy
Significantly, rules can include several commands (except for SELECT rules). This is required if we want to support views that span multiple tables.
Let's create another table:
CREATE TABLE Addresses (
id INT PRIMARY KEY,
street varchar(200) NOT NULL,
city varchar(100) NOT NULL,
person_id int REFERENCES Persons ON DELETE CASCADE,
end_date timestamp
);
INSERT INTO Addresses (id, street, city, person_id) VALUES
(100, '1428 Elm Street', 'Springwood', 1),
(101, '742 Evergreen Terrace', 'Springfield', 2),
(102, '221B Baker Street', 'London', 3);
The end_date
column will come into play later, so ignore it for now.
Now we'll create a new view that spans both tables:
CREATE VIEW PersonAddress AS
SELECT p.id as pid,
CONCAT(first_name, ' ', last_name) AS name,
a.id as aid,
CONCAT(street, ', ', city) as address
FROM Persons p
LEFT JOIN Addresses a
ON a.person_id = p.id
WHERE a.end_date IS NULL
We now have:
Just like last time, if we try to insert the PersonAddress
view, we get an error because the view contains two columns that are derived.
But we can create a rule that will handle it properly:
CREATE RULE PersonAddressInsert AS
ON INSERT TO PersonAddress DO INSTEAD
(
INSERT INTO Persons (id, first_name, last_name)
VALUES (NEW.pid, SPLIT_PART(NEW.name, ' ', 1), SPLIT_PART(NEW.name, ' ', 2));
INSERT INTO Addresses (id, street, city, person_id)
VALUES (NEW.aid, SPLIT_PART(NEW.address, ', ', 1), SPLIT_PART(NEW.address, ', ', 2), NEW.pid)
)
We can then insert into both tables at the same time using the view:
INSERT INTO PersonAddress (pid, name, aid, address)
VALUES (4, 'Doris Day', 103, '42 Wallaby Way, Sydney')
And we now have:
Pushing Into Trigger Territory
We've seen that INSERT, UPDATE, and DELETE rules can contain multiple statements, which means we can get creative.
For instance, what if we never want to update an address, but instead we want to mark the old address as obsolete with the end_date
column and insert a new address?
This sort of thing is usually handled by a trigger, but if we're going through the view for the update, we could handle it in an UPDATE rule:
CREATE RULE PersonAddressUpdate AS
ON UPDATE TO PersonAddress DO INSTEAD
(
UPDATE Persons
SET first_name=SPLIT_PART(NEW.name, ' ', 1), last_name=SPLIT_PART(NEW.name, ' ', 2)
WHERE id=NEW.pid;
INSERT INTO Addresses (id, street, city, person_id)
VALUES ((SELECT MAX(id) + 1 FROM Addresses), SPLIT_PART(NEW.address, ', ', 1),
SPLIT_PART(NEW.address, ', ', 2), NEW.pid);
UPDATE Addresses SET end_date = NOW()
WHERE id=NEW.aid;
)
SQL purists may squint a bit, but this works. Now when we run an update on the view, we're going to get a new row in the Addresses
table, the old row will have a value in the end_date
column, and the person's name will be updated:
UPDATE PersonAddress
SET name = 'Doris Kappelhoff', address = '32 Spooner Street, Quahog'
WHERE aid=103
Adding Behavior to Tables
You may have noticed the DO INSTEAD when we define a rule. There is also a DO ALSO, which allows us to add behavior to the normal execution of the query.
Generally speaking, rules cannot be recursive. You cannot have an INSERT rule on table T that includes an INSERT on table T — that would just go into infinite recursion.
But it's possible to change (say) an UPDATE into a table into (say) a DELETE if that makes sense. Rules are not just for views.
For instance, we may want to have a "garbage collection" mechanism that automatically deletes any person that no longer has an address. This could be done with a rule defined for the Addresses
table:
CREATE RULE AddressDelete AS
ON DELETE TO Addresses DO ALSO
DELETE FROM Persons
WHERE id = OLD.person_id AND
(SELECT COUNT(*) from Addresses WHERE person_id = OLD.person_id) = 1
Now if we delete the last address of a person, we will automatically delete the person who owned that address (in this case, Charlie Chaplin):
DELETE FROM Addresses WHERE id=102
It is possible to define multiple rules of the same type for the same table; in that case, they will be executed in alphabetic order.
Permissions
Because rules can affect tables that are not specifically mentioned in the original request, the question arises: what permissions should the database use when executing these rules?
In our last example, for instance, we deleted an address, which presumes we had adequate privileges on the Addresses
table to do so.
But, perhaps unbeknownst to us, that deletion also deleted a person. What if we don't have DELETE permissions on the Persons
table?
In PostgreSQL, all rules belong to the owner of the table to which the rules are attached. When a rule executes, the permissions of that table's owner will be applied when accessing tables that are not specifically mentioned in the original query.
This may sound like a security hole, but it really isn't (see the PostgreSQL documentation if you're skeptical).
Is This the Right Way to Do This?
Rewrite rules are useful, even indispensable when you want to work with views, and you need to define what happens to INSERT, UPDATE, and DELETE. If nothing else, I hope to have made you aware of that.
Rewrite rules can also be useful if you need to change the behavior of some tables, within reason.
But how far should you push this? Just because you can do it does not mean that you should.
That's not a straightforward question. Triggers can make more sense because their intent is often more clear, and they have more options (BEFORE/AFTER/INSTEAD OF, FOR EACH ROW, etc.).
But rewrite rules can sometimes be easier to express than triggers. The PostgreSQL documentation includes a discussion of this topic, for those who want to dive into it.
My general recommendation is to use rewrite rules to make non-trivial views updatable and to use triggers for other behavior. Rewrite rules are less familiar to most people, and they are completely unique to PostgreSQL. Keep in mind that the next person to maintain what you're doing may not be as experienced as you are.
But then again, every situation is unique, so as always, you'll have to rely on your best judgement.
Opinions expressed by DZone contributors are their own.
Comments