SQL Server to Postgres Database Migration
Need help migrating your database from SQL Server to PostgreSQL? We've got you covered with useful tips for a smooth transition.
Join the DZone community and get the full member experience.
Join For FreeIn this article, specialists of Intelligent Converters share their experience on database migration from SQL Server to Postgres. It covers tips and tricks, the most important bottlenecks, and best practices of migration.
Most SQL Server to Postgres migration projects consist of the following steps:
- Find and review all SQL Server-specific entries in the source database (data types, attributes, built-in functions, operators, etc). Find a way to convert them into PostgreSQL format.
- Translate definitions of tables, indexes, and constraints from SQL Server to Postgres with respect to differences between the syntax of the source and destination DBMS.
- Run the data migration using the most suitable approach to decrease downtime of the SQL Server database as much as possible.
- Translate stored procedures, functions, triggers, and views from SQL Server to Postgres format.
- Validate the resulting database, run performance and functional tests, check that all the required transformations are made, and fine-tune the performance.
Each of these steps is explored below in detail.
Migration of Table Definitions
Migration of table definitions from SQL Server to Postgres consists of types mapping and conversion of default values and other related attributes of every column. The two DBMS have the majority of equal data types, for example, BIGINT, DATE, DECIMAL, INT, MONEY, NUMERIC, REAL, SMALLINT, and TEXT. However, there are distinguished types that have a safe mapping from SQL Server to PostgreSQL, as specified in this table:
SQL Server |
Postgres |
BINARY(n) |
BYTEA |
BIT |
BOOLEAN, BOOL |
CHAR(n) where n>8000 |
TEXT |
DATETIME |
TIMESTAMP(3) |
DATETIME2(n) |
TIMESTAMP(n) |
DATETIMEOFFSET(n) |
TIMESTAMP(n) WITH TIME ZONE, TIMESTAMPTZ |
FLOAT(n) |
DOUBLE PRECISION, FLOAT8 |
IMAGE |
BYTEA |
NCHAR(n) where n>8000 |
TEXT |
NTEXT |
TEXT |
NVARCHAR(max) |
TEXT |
ROWVERSION |
BYTEA |
SMALLMONEY |
MONEY |
UNIQUEIDENTIFIER |
CHAR(16), UUID |
VARBINARY(max) |
BYTEA |
VARCHAR(max) |
TEXT |
Also, SQL Server provides spatial types GEOGRAPHY and GEOMETRY that require installation of the special extension PostGIS to work in PostgreSQL.
Another point of attention is the IDENTITY attribute for integer columns in SQL Server. If both seed and increment parts of IDENTITY are equal to 1, it may be converted into Postgres SERIAL for INT or BIGSERIAL for BIGINT. Otherwise, the IDENTITY attribute must be preserved in the Postgres table (supported in versions starting from 10) with respect to the syntax of the DBMS. For example, the SQL Server table declared as follows:
CREATE TABLE Logs(
Id INT NOT NULL IDENTITY(2,4) PRIMARY KEY,
Msg VARCHAR(200)
);
Must be converted according to Postgres syntax as follows:
CREATE TABLE Logs(
Id INT NOT NULL GENERATED BY DEFAULT AS IDENTITY
(START WITH 2 INCREMENT BY 4) PRIMARY KEY,
Msg VARCHAR(200)
);
Migration of Data
In general, migration of data from SQL Server to Postgres is a quite straightforward process except for a few issues. SQL Server binary data is mostly migrated into Postgres BYTEA; however, large data (exceeding 10MB) requires another approach. This is because BYTEA data can only be extracted as a single fragment; piecewise reading is not supported by Postgres for this data type. Therefore, reading huge BYTEA data may cause significant RAM overhead.
Fortunately, PostgreSQL provides an alternative solution to store huge binary data known as the LARGE OBJECT that supports stream-style access to the data. LARGE OBJECT values are stored in the special internal table called 'pg_largeobject' that can have up to 4 billion rows, with the maximum size of each object being 4TB. LARGE OBJECT supports piecewise reading and so it can be a workaround for BYTEA limitations.
Spatial data is another point of attention while migrating data. It must be migrated via the special text representation known as well-known text (WKT). For example, there is an SQL Server table defined as follows:
CREATE TABLE spatialdata(
Id INT NOT NULL,
data1 geometry,
data2 geography
)
Then we can use the built-in function STAsText of SQL Server to extract text representation of geometry and geography data:
SELECT Id, data1.STAsText(), data2.STAsText() FROM spatialdata
After the PostGIS extension is installed in Postgres, the spatial data must be inserted in the table through WKT representation that may look like this:
INSERT INTO spatialdata VALUES (
1,
'POLYGON ((5 5, 10 5, 10 10, 5 5))',
'LINESTRING (-122.36 47.656, -122.343 47.656)'
);
SQL Server allows linking external data (stored outside the database) to be treated as a regular table. Postgres offers a similar feature implemented through the Foreign Data Wrapper (FDW) library. For example, you can use the FDW extension to manipulate an external CSV file as a native table.
Approaches to Data Migration
One of the most important questions when migrating large databases is how to prevent unacceptable system downtime or overhead. In general, there are three known techniques of data migration:
1. Snapshot Migration is the most straightforward method requiring all the data to be migrated in a single transaction. Obviously, this approach may cause essential downtime of the source database during the overall reading of data to prevent data loss or corruption.
2. Piecewise Snapshot Migration splits the data into chunks and then migrated those chunks simultaneously via parallel threads or processes. Following this way, the downtime is significantly reduced compared to the snapshot method. Specialists of Intelligent Converters use the piecewise snapshot migration for most of the migration projects.
3. Changed Data Replication (CDR) is the continuous migration of data based on tracking incremental changes and replicating the updated data only. This method allows decreasing downtime of the source system almost to zero since it processes a minimal volume of data per transaction.
The best practice of database migration requires careful evaluation of the project’s requirements to choose the most suitable method that balances acceptable downtime, overhead, and efficiency. To automate and simplify the database migration, it is reasonable to use special software like SQL Server to Postgres converter. This product provides safe types mapping, implements a piecewise snapshot method of data migration, and allows deep customization of the process according to the customer’s requirements (customize migration rules, edit the target table definition, filter data for migration, etc.).
Migration of SQL Code
This section of the whitepaper explores all possible issues of the stored procedures, functions, and triggers migration from SQL Server to Postgres.
Types Casting
Type casting in Postgres is stricter compared to SQL Server since it requires explicit casting when calling functions, using operators, or updating data with expression results. There are two possible approaches to addressing this issue. One option is to include type-casting operators within the SQL code wherever required. Alternatively, you can use the 'anyelement' pseudo-type in Postgres, which enables versatile manipulation of data types during function calls and operations. For example:
CREATE OR REPLACE FUNCTION my_concat(str1 anyelement, str2 anyelement)
RETURNS VARCHAR
language plpgsql
AS $$
BEGIN
RETURN str1::VARCHAR || str2::VARCHAR;
END;
Keep in mind that stored procedures and functions must have either a single parameter or all parameters of the ‘anyelement’ type.
Triggers
There is an issue that must be resolved while migrating triggers from SQL Server to Postgres. In SQL Server, the trigger’s source code is enclosed inside the CREATE TRIGGER statement, while PostgreSQL requires a trigger to call a function containing all trigger’s logic. For example:
CREATE OR REPLACE FUNCTION employees_on_update_func()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS
$$
BEGIN
IF (NEW.first_name <> OLD.first_name) OR (NEW.last_name <> OLD.last_name)
OR (NEW.email <> OLD.email)
THEN
INSERT INTO changes_log(id,changed_on)
VALUES(OLD.id,now());
END IF;
RETURN NEW;
END;
$$
CREATE TRIGGER employees_on_update
BEFORE UPDATE ON employees FOR EACH ROW
EXECUTE PROCEDURE employees_on_update_func();
Built-In Functions
Migration of stores procedures, functions, and triggers from SQL Server to Postgres also requires proper replacement of all built-in functions missing in the target DBMS:
SQL Server |
Postgres |
CHARINDEX($substr, $str, $start_pos) |
STRPOS combined with SUBSTRING |
CHAR |
CHR |
DATEADD($interval, $n_units, $date) |
$date + $n_units * interval '1 second/minute/hour/day/month/year' |
DATEDIFF($interval, $date1, $date2) |
DATE_PART(<interval name>, $date2 - $date1) |
DATEPART |
DATE_PART |
GETDATE |
NOW() |
IIF($condition,$expr1,$expr2) |
CASE WHEN $condition THEN $expr1 ELSE $expr2 END |
ISNULL |
COALESCE |
LEN |
LENGTH |
REPLICATE |
REPEAT |
SPACE($n) |
REPEAT(', '$n) |
Conclusion
Migration from SQL Server to Postgres is a complicated process that may require much time and effort for large databases. It consists of five logical phases, each of which has some nuances. This article addresses only a few issues of database migration, while every project has its own challenges.
Opinions expressed by DZone contributors are their own.
Comments