Polymorphism in SQL Part One: Anyelement and Anyarray
Join the DZone community and get the full member experience.
Join For FreeThe term “polymorphism”, and its inflections, has various meanings in the context of programming languages in general. Roughly, it means “chameleon-like”. Of course, Wikipedia has something to say about it, here. PostgreSQL uses it with this meaning: “Parametric polymorphism: when one or more types are not specified by name but by… symbols that can represent any type”.
The term “variadic” has a universal meaning, and again Wikipedia has something to say about it, here. “A variadic [subprogram] has indefinite “arity”, [that is, it] accepts a variable number of arguments”.
In this two part post, I explain how YugabyteDB stored procedures implement these two central programming language notions.
- Part One (this post) explains how polymorphism is implemented by declaring a subprogram with formal parameters of the pseudo-types “anyelement” and “anyarray”.
- Part Two explains how marking a subprogram’s formal parameter with the keyword “variadic” implements that notion. I also bring the two notions together to show you how to implement a non-aggregate version of any built-in aggregate function so that you can use it outside the context of a SQL subquery, just as the “least()” and “greatest()” built-ins already (but uniquely) do this for the “min()” and “max()” aggregate functions.
You can see, then, that I’m stretching the technical meaning of “polymorphism” in the title of this post series. I’m focusing on the chameleon metaphor to capture the idea that I’m showing two powerful techniques that let functions automagically adapt to the way in which they’re invoked.
YugabyteDB is an open source, high-performance distributed SQL database built on a scalable and fault-tolerant design inspired by Google Spanner. YugabyteDB uses its own special distributed document store called DocDB. But it provides SQL and stored procedure functionality by re-using the “upper half” of the standard PostgreSQL source code. This is explained in the two part blog post “Distributed PostgreSQL on a Google Spanner Architecture”: (1) Storage Layer; and (2) Query Layer.
If you’re already used to writing PostgreSQL subprograms in your day job, then I recommend that you read the two parts of my essay in a single short sitting.
Everything that I say in this post applies in the same way to vanilla PostgreSQL, as it does to YugabyteDB. And all the code examples that I describe have identical outcomes in the two environments. There are no special distributed SQL considerations.
Understanding “anyelement” and “anyarray”
A user-defined function is usually created like this:
xxxxxxxxxx
create function f(p1 in int, p2 in boolean, ...)
returns text
immutable
language plpgsql
as $body$
begin
return <a text value>;
end;
$body$;
The function’s requirements are stated by specifying how the output value is determined by its input values. And commonly the rules can be stated only in terms of inputs of specified data types. It’s the same story for a user-defined procedure — and the same if the implementation uses “language plpgsql” or “language sql”.
Note: My earlier post, Four Compelling Use Cases for PostgreSQL Table Functions, shows examples of these two ways to implement user-defined functions.
Sometimes, though, the requirements reflect the same general rule with a specialization for each of several different input data types. Here’s a compelling example:
- Return an N-character text value that represents values of various data types according to rules that are specific to each different data type. The output should be right-justified, and should represent NULL as “<null>”.
A proper requirements statement would specify more detail. But I’ll simply show you two implementations that treat some interesting cases in a natural way. User-defined functions and procedures can be created in overload sets, where each set is for a function or procedure with a certain name and members of a set are distinguished from each other by the the data types of their formal parameters and/or by the number of these. So you could meet the present requirements by creating lots (and lots) of functions all called, say, “display()”. Here’s an example for a “boolean” input:
xxxxxxxxxx
create function display(val in boolean)
returns text
immutable
language plpgsql
as $body$
begin
return
case val
when true then ' true'
when false then ' false'
else '<null>'
end::text;
end;
$body$;
And here’s an example for a “bigint” input:
xxxxxxxxxx
create function display(val in bigint)
returns text
immutable
language plpgsql
as $body$
begin
return
case
when val is null then lpad('<null>', 7)
when abs(val) < 1000 then lpad(val::text, 7)
when abs(val) between 10^3 and 10^6 then lpad(round(val/10^3)::text||'K', 7)
when abs(val) between 10^6 and 10^9 then lpad(round(val/10^6)::text||'M', 7)
when abs(val) between 10^9 and 10^12 then lpad(round(val/10^9)::text||'G', 7)
else lpad('too big', 7)
end::text;
end;
$body$;
You can imagine similar overloads for other primitive scalar data types like “numeric”, “double precision”, “money”, “timestamp”, “timestamptz”, “interval” and so on. Maintenance would be simpler if the code for each data type were implemented in a single function. This would optimally self-document, in a single place, which data types it supported. And it could, for example, define a single manifest constant “width” to be used as the second argument for “lpad()” for values of all the supported data types. This is where “anyelement” comes to the rescue.
The anyelement Pseudo-Type
YSQL inherits the PostgreSQL type system. This implements several so-called pseudo-types. A pseudo-type cannot be used as a column data type, or for a local PL/pgSQL variable, but it can be used to declare a function’s formal parameters and result type. This blog post focuses on “anyelement” and “anyarray”.
Here’s the basic proof of concept. Make sure that you first drop any previously existing versions of the function “display()”.
xxxxxxxxxx
-- Such a user-defined type is known as a "row" type.
create type rt as (b boolean, i int);
create function display(val in anyelement)
returns text
immutable
language plpgsql
as $body$
declare
val_type constant regtype := pg_typeof(val);
begin
return
case val_type
when pg_typeof(null::boolean) then '"boolean" input'
when pg_typeof(null::int) then '"int" input'
when pg_typeof(null::bigint) then '"bigint" input'
when pg_typeof(null::rt) then '"rt" input'
else 'unsupported data type'
end::text;
end;
$body$;
Test it like this:
xxxxxxxxxx
select display(true)
union all
select display(43)
union all
select display(567890123456789)
union all
select display((true, 42)::rt)
union all
select display(now()::timestamp);
This is the result:
xxxxxxxxxx
display
-----------------------
"boolean" input
"int" input
"bigint" input
"rt" input
unsupported data type
We see, then, that “anyelement” stands for any scalar (that is non-array) data type — simple (like “boolean” or “bigint”) or composite (like a user-defined “row” type). A function with a pseudo-type formal parameter like “anyelement” (or “anyarray”) is called a polymorphic function. We can now use the approach demonstrated by the proof of concept to encapsulate “boolean”, “bigint”, and “rt” implementations of “display()” into a single polymorphic function, like this. As before, make sure that you first drop any previously existing versions of the function “display()”. And if the type “rt” already exists, you can leave in in place.
xxxxxxxxxx
create type rt as (b boolean, i int);
create function display(val in anyelement)
returns text
immutable
language plpgsql
as $body$
declare
val_type constant regtype := pg_typeof(val);
pad constant int := 22;
begin
case val_type
when pg_typeof(null::boolean) then
return
case val
when true then lpad('true', pad)
when false then lpad('false', pad)
else lpad('<null>', pad)
end::text;
when pg_typeof(null::bigint) then
return
case
when val is null then
lpad('<null>', pad)
when abs(val) < 1000 then
lpad(val::text, pad)
when abs(val) between 10^3 and 10^6 then
lpad(round(val/10^3)::text||'K', pad)
when abs(val) between 10^6 and 10^9 then
lpad(round(val/10^6)::text||'M', pad)
when abs(val) between 10^9 and 10^12 then
lpad(round(val/10^9)::text||'G', pad)
else
lpad('too big', pad)
end::text;
when pg_typeof(null::rt) then
return lpad(val::text, pad);
else
return lpad('unsupported data type'::text, pad);
end case;
end;
$body$;
Test it like this:
xxxxxxxxxx
(with v as (values
(true),
(false),
(null::boolean))
select
display(column1)
from v)
union all
(with v as (values
(123),
(234567),
(345678901),
(456789012345),
(567890123456789),
(null::bigint))
select
display(column1)
from v)
union all
select display((true, 42)::rt)
union all
select display(now());
This is the result:
xxxxxxxxxx
display
------------------------
true
false
<null>
123
235K
346M
457G
too big
<null>
(t,42)
unsupported data type
The anyarray Pseudo-Type
This is the natural partner to “anyelement”. It stands for an array of any data type. This simple example demonstrates the idea:
xxxxxxxxxx
create function array_display(arr anyarray)
returns setof text
immutable
language sql
as $body$
select display(unnest(arr));
$body$;
You take advantage, here, of a very useful built-in that materializes the elements of an array as a regular SQL table. Read about this, and more, in the “Array data types and functionality” major section of the YugabyteDB documentation—and, in particular, read about the “unnest()” built-in.
Test it like this:
xxxxxxxxxx
select array_display(array[true, false, null]::boolean[])
union all
select array_display(array[
123, 234567, 345678901, 456789012345, 567890123456789, null]::bigint[])
union all
select array_display(array[now()]::timestamp[]);
This is the result:
xxxxxxxxxx
array_display
------------------------
true
false
<null>
123
235K
346M
457G
too big
<null>
unsupported data type
Conclusion
So far, I’ve shown you just one of the two interesting and useful features that this two-part post explains:
- How to use the “anyelement” and “anyarray” pseudo-types to write a polymorphic subprogram with a dynamically determined polymorphic return value.
YugabyteDB inherits both features from PostgreSQL thanks to its unique architecture that simply uses its SQL processing code as is.
I can’t resist saying here (but I’ll say it again in the conclusion to Part Two) that Oracle Database currently supports neither anything equivalent to the pseudo-types like “anyelement” and “anyarray”, nor the ability to create variadic user-defined subprograms. And, as I never miss a chance to point out, the other distributed SQL databases with which we compare YugabyteDB (CockroachDB, TiDB, and our common inspiration Google Cloud Spanner) don’t support stored procedures at all.
Please go on now to read Part Two. It explains how to write a variadic subprogram.
Published at DZone with permission of Bryn Llewellyn. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments