Non-blocking Database Migrations
Learn more about data base migrations and dealing with large data.
Join the DZone community and get the full member experience.
Join For FreeDatabase migrations are a common part of any web application. They are used to update the database schema to match the application's code. In a traditional web application, the database migrations are run synchronously, which means that the application is blocked until the migration is complete. This is not ideal, as it means that the application is unavailable to users during the migration. Long past the days when stopping the service for maintenance was acceptable; we need to be able to run migrations without blocking the application.
It's easy to perform database migrations in small databases or if you have no load. But what if you have a large database and a lot of users?
Initial Database Structure
Let's imagine we have a simple table for storing customer data:
DROP TABLE IF EXISTS customer;
CREATE TABLE
customer (
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(256),
balance FLOAT NOT NULL DEFAULT 0,
UNIQUE INDEX email_idx (email)
);
INSERT
INTO
customer (email, balance)
VALUES
('alice@example.com', 42),
('bob@example.com', -42);
My main focus will be on the "balance" field.
It's a float, which means that it can have a fractional part. We want to change the type of this field to an integer so that we can store the balance in cents. This is a common practice in financial applications.
But perhaps you already have an active application with millions of users. As a developer, your responsibility is to make sure the application is available to users during the migration. Also, you need to make sure the application will work correctly after the migration, and in case it's not working properly, you have to roll back your changes without many customers noticing it.
So the plan is:
- Create a new field "balance_cents" with the integer type.
- Deploy the application with the new field. A new application version should be available for a limited amount of users.
- Make sure the application is working correctly with the new field.
- Deploy the new application version for all users.
- Drop the old field "balance".
The tricky part is to make data consistent between the old and new fields. As these fields are related to each other, we need to make sure changes in the old field are reflected in the new field, and vice versa.
Let's go step by step and see how we can implement this plan.
Create a New Field
Creating a new field "balance_cents" with the integer type is a simple task:
ALTER TABLE
customer
ADD COLUMN
balance_cents INT NOT NULL DEFAULT 0;
Set Up Synchronization Between the Old and New Fields
At this point, the application does not know anything about the new field, and it is a perfect time to set up synchronization between the old and new fields. And to achieve this, we need to create triggers
First, let's create a trigger on the insert.
DROP TRIGGER IF EXISTS insert_balances;
CREATE TRIGGER insert_balances
BEFORE INSERT
ON customer
FOR EACH ROW IF new.balance <> 0 THEN -- insert from the old code
SET
new.balance_cents = CEIL(new.balance * 100);
ELSEIF new.balance_cents <> 0 THEN -- insert from the new code
SET
new.balance = new.balance_cents / 100;
END IF;
$$
DELIMITER ;
Let's do some experiments to see how this trigger works.
Insert from the old code:
INTO
customer (email, balance)
VALUES
('account_from_old_code@example.com', -1.23);
SELECT
email,
balance,
customer.balance_cents
FROM
customer
WHERE
email = 'account_from_old_code@example.com';
+-----------------------------------+---------+---------------+
| email | balance | balance_cents |
+-----------------------------------+---------+---------------+
| account_from_old_code@example.com | -1.23 | -123 |
+-----------------------------------+---------+---------------+
Insert from the new code:
INSERT
INTO
customer (email, customer.balance_cents)
VALUES
('account_from_new_code@example.com', 345);
SELECT
email,
balance,
customer.balance_cents
FROM
customer
WHERE
email = 'account_from_new_code@example.com';
+-----------------------------------+---------+---------------+
| email | balance | balance_cents |
+-----------------------------------+---------+---------------+
| account_from_new_code@example.com | 3.45 | 345 |
+-----------------------------------+---------+---------------+
So the trigger works as expected.
Now let's create a trigger for an update.
DROP TRIGGER IF EXISTS update_balances;
DELIMITER $$
CREATE TRIGGER update_balances
BEFORE UPDATE
ON customer
FOR EACH ROW IF new.balance <> old.balance THEN -- update from the old code
SET
new.balance_cents = CEIL(new.balance * 100);
ELSEIF new.balance_cents <> old.balance_cents THEN -- update from new code
SET
new.balance = new.balance_cents / 100;
END IF;
$$
DELIMITER ;
And now let's test it.
Making updates from the old code:
UPDATE customer
SET
balance = -1.45
WHERE
email = 'account_from_new_code@example.com';
SELECT
email,
balance,
balance_cents
FROM
customer
WHERE
email = 'account_from_new_code@example.com';
+-----------------------------------+---------+---------------+
| email | balance | balance_cents |
+-----------------------------------+---------+---------------+
| account_from_new_code@example.com | -1.45 | -145 |
+-----------------------------------+---------+---------------+
Making updates from the new code:
UPDATE customer
SET
balance_cents = 567
WHERE
email = 'account_from_new_code@example.com';
SELECT
email,
balance,
balance_cents
FROM
customer
WHERE
email = 'account_from_new_code@example.com';
+-----------------------------------+---------+---------------+
| email | balance | balance_cents |
+-----------------------------------+---------+---------------+
| account_from_new_code@example.com | 5.67 | 567 |
+-----------------------------------+---------+---------------+
Our triggers work as expected. Now we need to fill the empty "balance_cents" field with data from the "balance" field.
Filling the Empty “balance_cents” Field
The simplest way to fill the empty "balance_cents" field is to use the UPDATE statement:
UPDATE customer
SET
balance_cents = CEIL(balance * 100);
But that update query will put a lot of pressure on the database. And as our main goal is to avoid downtime, the update process should be performed in small batches.
It is possible to create a migration script inside the application, but as we are playing with SQL, let's create a stored procedure.
DROP PROCEDURE IF EXISTS batch_update_balance_cents;
DELIMITER $$
CREATE PROCEDURE batch_update_balance_cents(
start_id INT,
end_id INT,
batch_size INT)
BEGIN
DECLARE batch_start INT DEFAULT start_id;
DECLARE batch_end INT DEFAULT start_id + batch_size;
IF end_id < start_id + batch_size THEN
SET end_id = start_id + batch_size;
END IF;
WHILE batch_end <= end_id
DO
UPDATE customer
SET
balance_cents = CEIL(balance * 100)
WHERE
id BETWEEN batch_start AND batch_end;
SET batch_start = batch_start + batch_size;
SET batch_end = batch_end + batch_size;
END WHILE;
END$$
DELIMITER ;
CALL batch_update_balance_cents(1, (SELECT
MAX(id)
FROM
customer), 1000);
And now let's check the result:
SELECT *
FROM
customer;
+----+-----------------------------------+---------+---------------+
| id | email | balance | balance_cents |
+----+-----------------------------------+---------+---------------+
| 1 | alice@example.com | 42 | 4200 |
| 2 | bob@example.com | -42 | -4200 |
| 3 | account_from_old_code@example.com | -1.23 | -123 |
| 4 | account_from_new_code@example.com | 5.68 | 568 |
+----+-----------------------------------+---------+---------------+
All old entries were updated.
Drop Triggers and Stored Procedure
Migration is done. Everyone is happy about our latest changes. New application code is deployed to all customers.
It's time to drop triggers and stored procedures.
DROP PROCEDURE IF EXISTS batch_update_balance_cents;
DROP TRIGGER IF EXISTS update_balances;
DROP TRIGGER IF EXISTS insert_balances;
Drop the Old Field
And now, no one uses the old field. It's time to drop it.
ALTER TABLE customer
DROP COLUMN balance;
Conclusion
In this article, we have shown how to migrate from one field to another without downtime.
We have used triggers and stored procedures to keep the data in sync.
I used a particular example with the balance field, but it's possible to use the same approach for any other field or set of fields.
Opinions expressed by DZone contributors are their own.
Comments