Tutorial: How to Define SQL Functions With Presto Across All Connectors
Learn how to implement your own SQL function with Presto, the open-source SQL query engine.
Join the DZone community and get the full member experience.
Join For FreePresto is the open-source SQL query engine for data lakes. It supports many native functions which are usually sufficient for most use cases. However, there is maybe a corner case where you need to implement your own function. To simplify this, Presto allows users to define expressions as SQL functions. These are dynamic functions separated from the Presto source code, managed by a functions namespace manager that you can set up with a MySQL database. In fact, this is one of the most widely used features of Presto at Facebook, with over 1000s of functions defined.
Function Namespace Manager
A function namespace is a special catalog.schema
that stores functions in the format like mysql.test
. Each catalog.schema can be a function namespace. A function namespace manager is a plugin that manages a set of these function catalog schemas. The catalog can be mapped to connectors in Presto (a connector for functions, no tables or view) and allows the Presto engine to perform actions such as creating, altering, and deleting functions.
This user-defined function management is separated from connector API for flexibility, hence these SQL functions can be used across all connectors. Further, the query is guaranteed to use the same version of the function throughout the execution and any modification to the functions is versioned.
Implementation
Today, function namespace manager is implemented with the help of MySQL, so users need to have a running MySQL service to initialize the MySQL-based function namespace manager.
Step 1: Provision MySQL Server and Generate JDBC URL for Further Access
Suppose the MySQL server can be reached at localhost:3306, for example database URL –
jdbc:mysql://localhost:3306/presto?user=root&password=password
Step 2: Create Database and Tables in MySQL Database to Store Function Namespace Manager Related Data
CREATE DATABASE presto;
USE presto;
Step 3: Configure at Presto [2]
Create Function namespace manager configuration under etc/function-namespace/mysql.properties:
function-namespace-manager.name=mysql database-url=jdbc:mysql://localhost:3306/presto?user=root&password=password
function-namespaces-table-name=function_namespaces
functions-table-name=sql_functions
And restart the Presto Service.
Step 4: Create New Function Namescape
Now once the Presto server is started we will see below tables under presto database (which is being used to manage function namespace) in Mysql –
mysql> show tables;
+---------------------+
| Tables_in_presto |
+---------------------+
| enum_types |
| function_namespaces |
| sql_functions |
+---------------------+
93 rows in set (0.00 sec)
To create a new function namespace ”ahana.default”, insert into the function_namespaces table:
INSERT INTO function_namespaces (catalog_name, schema_name)
VALUES('ahana', 'default');
Step 5: Create a Function and Query From Presto [1]
Here is simple example of SQL function for COSECANT:
presto>CREATE OR REPLACE FUNCTION ahana.default.cosec(x double)
RETURNS double
COMMENT ‘Cosecant trigonometric function'
LANGUAGE SQL
DETERMINISTIC
RETURNS NULL ON NULL INPUT
RETURN 1 / sin(x);
More examples can be found at https://prestodb.io/docs/current/sql/create-function.html#examples [1]
Step 6: Apply the Newly Created Function and SQL Query
It is required for users to use a fully qualified function name while using in SQL queries.
Following the the example of using cosec SQL function in the query.
presto> select ahana.default.cosec (50) as Cosec_value;
Cosec_value
---------------------
-3.8113408578721053
(1 row)
Query 20211103_211533_00002_ajuyv, FINISHED, 1 node
Splits: 33 total, 33 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s
Here is another simple example of creating an EpochTimeToLocalDate
function to convert Unix time to local timezone under ahana.default
function namespace.
presto> CREATE FUNCTION ahana.default. EpochTimeToLocalDate (x bigint)
-> RETURNS timestamp
-> LANGUAGE SQL
-> DETERMINISTIC RETURNS NULL ON NULL INPUT
-> RETURN from_unixtime (x);
CREATE FUNCTION
presto> select ahana.default.EpochTimeToLocalDate(1629837828) as date;
date
-------------------------
2021-08-24 13:43:48.000
(1 row)
Query 20211101_230315_00043_ajuyv, FINISHED, 1 node
Splits: 33 total, 33 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Note
function-namespaces-table-name
<The name of the table that stores all the function namespaces managed by this manager> property can be used if there is a use case to instantiate multiple function namespace managers. Otherwise, if we can create functions in only one function namespace manager then it can be utilized across all different databases or connectors. [2]
Future Roadmap
Remote function Support with remote UDF thrift API
Allows you to run arbitrary functions that are either not safe or not possible to run within worker JVM: unreliable Java functions, C++, Python, etc.
References
Opinions expressed by DZone contributors are their own.
Comments