Rolling Upgrade Hazelcast IMDG on Kubernetes
Check out what the Rolling Upgrade and Graceful Shutdown features bring to an integrated Hazelcast environment.
Join the DZone community and get the full member experience.
Join For FreeHazelcast IMDG is tightly integrated into the Kubernetes ecosystem thanks to the Hazelcast Kubernetes plugin. In previous blog posts, we shared how to use auto-discovery for the embedded Hazelcast and steps for scaling it up and down using native kubectl commands. In this post, we’ll focus on another useful feature, Rolling Upgrade. You can apply it to your Hazelcast cluster whether you use client-server or embedded Hazelcast and regardless of whether you deploy using Kubernetes StatefulSet or Deployment. Everything is, as always with Hazelcast, intuitive and straightforward.
Rolling Upgrade StatefulSet
The preferred way of deploying Hazelcast on Kubernetes is using StatefulSet. As an example, you can start a cluster using a Helm Chart or Kubernetes Code Sample.
When you decide to update the Hazelcast Docker image version in your Kubernetes configuration, then Kubernetes automatically performs the Rolling Upgrade procedure:
- Send
SIGTERM
signal to Pod N - Wait maximum
terminationGracePeriodSeconds
(30 seconds by default) - Send
SIGKILL
signal to Pod N - Start new Pod N
- Wait until new Pod N is ready
- Send
SIGTERM
signal to Pod N-1 - Wait maximum
terminationGracePeriodSeconds
(30 seconds by default) - Send
SIGKILL
signal to Pod N-1
This procedure continues until all Pods are replaced with their new versions.
Note that Hazelcast's default reaction to the SIGTERM
signal is to terminate the instance suddenly. So now, if Pod N stores some data and the only backup of this data is stored in Pod N-1, then Kubernetes may terminate both Pods before they manage to migrate the data to the remaining members. Such a scenario means that the default behavior may result in data loss during the Rolling Upgrade procedure.
The solution for that is to enable Graceful Shutdown for Hazelcast members and to increase the termination grace period to a value that guarantees the migration's completion in the given time. We can do it with the following Kubernetes configuration:
apiVersion: apps/v1
kind: StatefulSet
spec:
...
template:
spec:
terminationGracePeriodSeconds: 600
containers:
- name: hazelcast
...
- name: JAVA_OPTS value: "-Dhazelcast.shutdownhook.policy=GRACEFUL -Dhazelcast.graceful.shutdown.max.wait=600"
...
Let's describe the parameters we used:
terminationGracePeriodSeconds
: Number of seconds Kubernetes waits before forcing Pod to terminatehazelcast.shutdownhook.policy=GRACEFUL
: Enables graceful shutdown for Hazelcasthazelcast.graceful.shutdown.max.wait
: Number of seconds Hazelcast waits before terminating its process (the same asterminationGracePeriodSeconds
, but from the Hazelcast process perspective)
After setting these parameters, your data is safe, and you can update the Hazelcast Docker image version and apply the new Kubernetes configuration. This will result in a successful Rolling Upgrade of your Hazelcast cluster.
Hazelcast Graceful Shutdown
We used the Hazelcast Graceful Shutdown, but how does it work under the hood? The main point of the Graceful Shutdown is to migrate all data replicas owned by the shutting-down member to the other running cluster members. After this process is complete, the shutting down member does not own any of the data (neither the main replicas nor the backup). You can imagine the Graceful Shutdown process as follows:
- Hazelcast member receives a signal to shut down
- It changes its state to
SHUTTING_DOWN
- It sends information to the master member to start the data migration process
- It waits for the data partitions to be migrated (or until the deadline
hazelcast.graceful.shutdown.max.wait
is reached) - It changes its state to
SHUT_DOWN
This way, we can be sure that when a member is going to shut down, it first transfers all of its data to other members. The last thing to mention is that using hazelcast.shutdownhook.policy=GRACEFUL
is not the only way to shut down Hazelcast gracefully. The alternatives are:
- Method
HazelcastInstance.shutdown()
(if you use Hazelcast embedded in your JVM application) - JMX API's shutdown method
- "Shutdown Member" button in the Hazelcast Management Center application
Now that we understand how the Graceful Shutdown procedure works, let's come back to the main subject of this blog post, the Rolling Upgrade process.
Rolling Upgrade by Minor Version (Enterprise Only)
Hazelcast Enterprise enables Rolling Upgrades among minor versions. In other words, Hazelcast IMDG makes it possible to apply Rolling Upgrade only to patch versions, for example 3.12=>3.12.1
, whereas Hazelcast IMDG Enterprise lets you upgrade 3.11.4=>3.12
. This is a handy feature because you don't have to stop your cluster ensure your Hazelcast always up-to-date.
To use Rolling Upgrade by minor version requires setting one more JVM parameter (in JAVA_OPTS
) to work automatically on Kubernetes.
-Dhazelcast.cluster.version.auto.upgrade.enabled=true
This is necessary because the Hazelcast cluster version is not updated by default. For example, we could start a cluster with the version 3.11
, perform Rolling Upgrade to 3.12
, and even though all members would be 3.12
, the cluster would still use the 3.11
protocol. To prevent this from happening, the additional JVM parameter makes the cluster version upgrade automatically after the Rolling Upgrade procedure is complete.
Rolling Upgrade Deployment
As mentioned previously, the preferred method of deploying Hazelcast on Kubernetes is to use StatefulSet. The main reason for that is because using Deployment may (in some rare cases) start a Hazelcast cluster with a split brain (which recovers in a few minutes). Also, after all, Hazelcast is not a stateless service, but rather a database and these kinds of applications are usually deployed as StatefulSets in Kubernetes.
Nevertheless, in some cases, your system architecture may require using Deployment, or you may use Hazelcast embedded in your (micro)services and for some reason, you need to deploy them using Deployment. In such case, you can still use Rolling Upgrade for Hazelcast, but you must remember about one crucial detail - by default Kubernetes does not perform Rolling Upgrade Pod-by-Pod, but instead keeps a certain percent of Pods alive. For example, if you have 10 Pods, then by default, Kubernetes will, all of a sudden, terminate 2 Pods (25%) and at the same time start 2 new Pods (without waiting for the old Pods to get terminated). If these 2 Pods store the data and backup for some data partition, then you may encounter a data loss.
To prevent data loss in the Rolling Upgrade procedure for Deployment, you must ensure that no more Pods than the Hazelcast backup-count
( 1
by default) are terminated at the same time. To do it, you can add the following Kubernetes configuration:
apiVersion: apps/v1
kind: Deployment
spec:
...
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1 maxSurge: 0
In this case, Kubernetes won't terminate more than 1 Pod at the same time ( maxUnavailable: 1
) and it won't start any new Pod ( maxSurge: 0
) before terminating the old one.
Rolling Upgrade with Helm Chart
The configurations mentioned above are already implemented in the Hazelcast Helm Charts. That is why if you use Helm to install your applications, you can start a Hazelcast cluster with the following command:
$ helm install --name my-release --set image.tag=3.12 hazelcast/hazelcast
Then, to perform the Rolling Upgrade, all you have to do is to change the image tag.
$ helm upgrade my-release --set image.tag=3.12.1 hazelcast/hazelcast
Then, everything happens automatically and you can enjoy the new Hazelcast cluster version without any downtime or data loss.
Published at DZone with permission of Rafał Leszko. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments