Upgrading Kubernetes Clusters With Cluster API on Oracle Cloud
Cluster API makes upgrading Kubernetes easy on Oracle Cloud.
Join the DZone community and get the full member experience.
Join For FreeIn this post, I’ll cover an awesome feature of Cluster API: the ability to do a rolling upgrade of your Kubernetes cluster. Cluster API makes it simple, and repeatable.
I’ll be totally honest, I’ve manually upgraded a Kubernetes cluster, it wasn’t the end of the world, but I’m a lazy hacker so why do it manually when I can automate it and have the safety of repeatability?
What Is Cluster API?
If you are not familiar with Cluster API, it is a Kubernetes sub-project focused on providing declarative APIs and tooling to simplify the provisioning, upgrading, and operating of multiple Kubernetes clusters. As an analogy, think of Cluster API as your Java Interface and it uses Kubernetes-style interfaces to manage the needed infrastructure for a Kubernetes cluster.
Back to our Java analogy; in order to use said Java interface, you need to implement it in a class. Cluster API uses an infrastructure provider model to extend support to multiple infrastructure providers. Almost every infrastructure provider implements one. Oracle’s is found here, and it can be used to build clusters in Oracle Cloud. For a brief introduction on how to start out with our provider check out https://blogs.oracle.com/cloud-infrastructure/post/create-and-manage-kubernetes-clusters-on-oracle-cloud-infrastructure-with-cluster-api or read our documentation for more info on getting started. You should also check out Cluster API’s awesome book.
Create a New Kubernetes Image
In order to upgrade our worker nodes, we need to use Kubernetes Image Builder to build the image. Follow the more detailed steps in Building Images section for prerequisites and other setup.
We will then need to set kubernetes
info to a newer version than our current cluster version. Right now, the cluster being used is 1.22.9
and we want to upgrade to 1.23.6
(current release versions can be found here https://kubernetes.io/releases/). We will edit images/capi/packer/config/kubernetes.json
and change the following:
"kubernetes_deb_version": "1.23.6-00",
"kubernetes_rpm_version": "1.23.6-0",
"kubernetes_semver": "v1.23.6",
"kubernetes_series": "v1.23"
After the config is updated, we will use Ubuntu 20.04 build to create the new image with packer:
$ cd <root_of_image_builder_repo>/images/capi
$ PACKER_VAR_FILES=oci.json make build-oci-ubuntu-2004
This will launch an instance in OCI to build the image. Once done, you should get output of the image’s OCID. You can also check that the image is built by visiting https://console.us-phoenix-1.oraclecloud.com/compute/images
. You will want to save this OCID as we will be using it later.
Upgrade My Cluster Using Cluster API
One of the main goals of Cluster API is
To manage the lifecycle (create, scale, upgrade, destroy) of Kubernetes-conformant clusters using a declarative API.
Automating the upgrade process is a big achievement. I don’t want to have to cordon/drain nodes to do the rolling update. The tools should do this for me.
I’m going to assume you already have a management and a workload cluster up and running. If not, follow the Getting Started guide to create the workload cluster. Below is an example of how I created my workload cluster:
...
clusterctl generate cluster oci-cluster-phx --kubernetes-version v1.22.9 \
--target-namespace default \
--control-plane-machine-count=3 \
--from https://github.com/oracle/cluster-api-provider-oci/releases/download/v0.3.0/cluster-template.yaml | kubectl apply -f -
Now that we have a workload cluster up and running and a new image, it is time to upgrade. The high-level steps of the upgrade process are as follows:
- Upgrade the control plane
- Upgrade the worker machines
Before we start, let’s go ahead and check the version of our running workload cluster. In order to access our workload cluster, we will need to export the Kubernetes config from our management cluster
$ clusterctl get kubeconfig oci-cluster-phx -n default > oci-cluster-phx.kubeconfig
Once we have the kubeconfig
file we can check the version of our workload cluster:
$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9", GitCommit:"6df4433e288edc9c40c2e344eb336f63fad45cd2", GitTreeState:"clean", BuildDate:"2022-04-13T19:52:02Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Notice the Server Version is v1.22.9
. Let’s change that.
Upgrade the Control Plane
First, let’s make a copy of the machine template for the control plane:
$ kubectl get ocimachinetemplate oci-cluster-phx-control-plane -o yaml > control-plane-machine-template.yaml
We need to modify the following:
spec.template.spec.imageId
- use the previously created custom image OCIDmetadata.name
- with a new name. example:oci-cluster-phx-control-plane-v1-23-6
Once the fields are modified, we can apply them to the cluster. Note that this will only create the new machine template. The next step will trigger the actual update.
$ kubectl apply -f control-plane-machine-template.yaml
ocimachinetemplate.infrastructure.cluster.x-k8s.io/oci-cluster-phx-control-plane configured
We now want to tell the KubeadmControlPlane
resource about the new machine template and upgrade the version number.
Save this patch file as kubeadm-control-plane-update.yaml
spec:
machineTemplate:
infrastructureRef:
name: oci-cluster-phx-control-plane-v1-23-6
version: v1.23.6
Then apply the patch:
$ kubectl patch --type=merge KubeadmControlPlane oci-cluster-phx-control-plane --patch-file kubeadm-control-plan-update.yaml
This will trigger the rolling update of the control plane.
We can watch the progress of the cluster via clusterctl
$ clusterctl describe cluster oci-cluster-phx
NAME READY SEVERITY REASON SINCE MESSAGE
Cluster/oci-cluster-phx False Warning RollingUpdateInProgress 98s Rolling 3 replicas with outdated spec (1 replicas up to date)
├─ClusterInfrastructure - OCICluster/oci-cluster-phx True 4h50m
├─ControlPlane - KubeadmControlPlane/oci-cluster-phx-control-plane False Warning RollingUpdateInProgress 98s Rolling 3 replicas with outdated spec (1 replicas up to date)
│ └─4 Machines... True 9m17s See oci-cluster-phx-control-plane-ptg4m, oci-cluster-phx-control-plane-sg67j, ...
└─Workers
└─MachineDeployment/oci-cluster-phx-md-0 True 10m
└─3 Machines... True 4h44m See oci-cluster-phx-md-0-8667c8d69-47nh9, oci-cluster-phx-md-0-8667c8d69-5r4zc, ...
We can also see the rolling update starting to happen with new instances being created:
$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig get nodes -A
NAME STATUS ROLES AGE VERSION
oci-cluster-phx-control-plane-464zs Ready control-plane,master 4h40m v1.22.5
oci-cluster-phx-control-plane-7vdxp NotReady control-plane,master 27s v1.23.6
oci-cluster-phx-control-plane-dhxml Ready control-plane,master 4h48m v1.22.5
oci-cluster-phx-control-plane-dmk8j Ready control-plane,master 4h44m v1.22.5
oci-cluster-phx-md-0-cnrbf Ready <none> 4h44m v1.22.5
oci-cluster-phx-md-0-hc6fj Ready <none> 4h45m v1.22.5
oci-cluster-phx-md-0-nc2g9 Ready <none> 4h44m v1.22.5
Before terminating a control plane instance, it will cordon and drain as expected:
oci-cluster-phx-control-plane-dmk8j NotReady,SchedulingDisabled control-plane,master 4h52m v1.22.5
This process should take about 15 minutes. Once all control plane nodes are upgraded you should see the new version using kubectl version
:
kubectl --kubeconfig=oci-cluster-phx.kubeconfig version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.6", GitCommit:"ad3338546da947756e8a88aa6822e9c11e7eac22", GitTreeState:"clean", BuildDate:"2022-04-14T08:43:11Z", GoVersion:"go1.17.9", Compiler:"gc", Platform:"linux/amd64"}
Upgrade the Worker Nodes
After upgrading the control plane nodes, we can now upgrade the worker nodes https://cluster-api.sigs.k8s.io/tasks/updating-machine-templates.html
First, we need to do is copy the machine template for the worker nodes:
$ kubectl get ocimachinetemplate oci-cluster-phx-md-0 -o yaml > worker-machine-template.yaml
You will want to modify the following
spec.template.spec.imageId
- use the previously created custom image OCIDmetadata.name
- with a new name. example:oci-cluster-phx-md-0-v1-23-6
Once the fields are modified, we need to apply them to the cluster. As before, this only creates the new machine template. The next step will start the actual update.
$ kubectl apply -f worker-machine-template.yaml
ocimachinetemplate.infrastructure.cluster.x-k8s.io/oci-cluster-phx-md-0-v1-23-6 created
We now want to modify the MachineDeployment
for the worker nodes with the new resource we just created.
Save this patch file as worker-machine-deployment-update.yaml
spec:
template:
spec:
infrastructureRef:
name: oci-cluster-phx-md-0-v1-23-6
version: v1.23.6
Then apply the patch which will trigger the rolling update of the machine deployment:
$ kubectl patch --type=merge MachineDeployment oci-cluster-phx-md-0 --patch-file worker-machine-deployment-update.yaml
Again, we can watch the progress of the cluster via the clusterctl
command. But unlike the control plane, the machine deployment handles updating the worker machines. clusterctl describe cluster
will only show the machine deployment being updated. If you want to watch the rolling update happen with new instances being created, you can do the following:
$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig get nodes -A
...
oci-cluster-phx-md-0-z59t8 Ready,SchedulingDisabled <none> 55m v1.22.5
oci-cluster-phx-md-0-z59t8 NotReady,SchedulingDisabled <none> 56m v1.22.5
If you have pods on the worker machines you will see them getting migrated to the new machines:
$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig get pods
NAME READY STATUS AGE NODE
echoserver-55587b4c46-2q5hz 1/1 Terminating 89m oci-cluster-phx-md-0-z59t8
echoserver-55587b4c46-4x72p 1/1 Running 5m24s oci-cluster-phx-md-0-v1-23-6-bqs8l
echoserver-55587b4c46-tmj4b 1/1 Running 29s oci-cluster-phx-md-0-v1-23-6-btjzs
echoserver-55587b4c46-vz7gm 1/1 Running 89m oci-cluster-phx-md-0-z79bd
After about 10 or 15 minutes, the workers should be updated in our example. Obviously, the more nodes you have the longer this rolling update will take. You can check the version of all the nodes to confirm:
$ kubectl --kubeconfig=oci-cluster-phx.kubeconfig get nodes -A
NAME STATUS ROLES AGE VERSION
oci-cluster-phx-control-plane-v1-23-6-926gx Ready control-plane,master 18m v1.23.6
oci-cluster-phx-control-plane-v1-23-6-vfp5g Ready control-plane,master 24m v1.23.6
oci-cluster-phx-control-plane-v1-23-6-vprqc Ready control-plane,master 30m v1.23.6
oci-cluster-phx-md-0-v1-23-6-bqs8l Ready <none> 9m58s v1.23.6
oci-cluster-phx-md-0-v1-23-6-btjzs Ready <none> 5m37s v1.23.6
oci-cluster-phx-md-0-v1-23-6-z79bd Ready <none> 71s v1.23.6
MachineDeploy Strategies
Cluster API offers two MachineDeployment strategies RollingUpdate
and OnDelete
.
The example we followed uses RollingUpdate
. With this strategy, you can modify maxSurge
and maxUnavailable
.
Both the maxSurge
and maxUnavailable
value can be an absolute number (for example, 5) or a percentage of desired machines (for example, 10%).
The other strategy option is OnDelete
. This requires the user to fully delete an old machine to drive the update. When the machine is fully deleted, the new one will come up.
For more understanding on how the MachineDeployments with Cluster API work, check out the documentation about MachineDeployments.
Conclusion
We created a new image and pushed a rolling upgrade to our cluster’s control plane and worker nodes all by making a few modifications in our configurations. Whether a cluster is small or large, the upgrade process is the same. If that isn’t a selling point for Cluster API, I don’t know what is.
The Cluster API project is growing rapidly with many new features coming. The OCI Cluster API provider team is working hard to bring all the new great features Cluster API has to offer such as ClusterClass
, MachinePools
and ManagedClusters
.
For updates on the cluster-api-provider-oci follow the GitHub repo. We are excited to be contributing to this open source project and hope you might too.
Opinions expressed by DZone contributors are their own.
Comments