Kubernetes Upgrade: The Definitive Guide to Do-It-Yourself
Join the DZone community and get the full member experience.
Join For FreeKubernetes is one of the most active projects on Github to date, having amassed more than 80k commits and 550 releases. The process of installing an HA Kubernetes cluster on-premises or in the Cloud is well documented and, in most cases, we don’t have to perform many steps. There are additional tools like Kops or Kubespray that help to automate some of this process.
Every so often, though, we are required to upgrade the cluster to keep up with the latest security features and bug fixes, as well as benefit from new features being released on an on-going basis. This is especially important when we have installed a really outdated version (for example v1.9) or if we want to automate the process and always be on top of the latest supported version.
In general, when operating an HA Kubernetes Cluster, the upgrade process involves two separate tasks which may not overlap or be performed simultaneously: upgrading the Kubernetes Cluster; and, if needed, upgrading the etcd cluster which is the distributed key-value backing store of Kubernetes. Let’s see how we can perform those tasks with minimal disruptions.
Kubernetes Upgrade Paths
Note that this upgrade process is specifically for manually installing Kubernetes in the Cloud or on-premises. It does not cover managed Kubernetes Environments (where Upgrades are automatically handled by the platform), or Kubernetes services on public clouds (such as AWS’ EKS or Azure Kubernetes Service), which have their own upgrade process.
For the purposes of this tutorial, we assume that a healthy 3-node Kubernetes and Etcd Clusters have been provisioned. I’ve setup mine using six DigitalOcean Droplets plus one for the worker node.
Let’s say that we have the following Kubernetes master nodes all running v1.13:
Name | Address | Hostname |
kube-1 | 10.0.11.1 | kube-1.example.com |
kube-2 | 10.0.11.2 | kube-2.example.com |
kube-3 | 10.0.11.3 | kube-3.example.com |
Also, we have one worker node running v1.13:
Name | Address | Hostname |
worker | 10.0.12.1 | worker..example.com |
The process of upgrading the Kubernetes master nodes is documented on the Kubernetes documentation site. The following are the current paths:
- Upgrade from v1.12 to v1.13 HA
- Upgrade from v1.12 to v1.13
- Upgrade from v1.13 to v1.14
- Upgrade from v1.14 to v1.15
There is only one documented version for HA Clusters here, but we can reuse the steps for the other upgrade paths. In this example, we are going to see an upgrade path from v1.13 to v.1.14 HA. Skipping a version – for example, upgrading from v1.13 to v.1.15 – is not recommended.
Before we start, we should always check the release notes of the version that we intend to upgrade, just in case they mention breaking changes.
Upgrading Kubernetes: A Step-by-Step Guide
Let’s follow the upgrade steps now:
1. Login Into the First Node and Upgrade the kubeadm Tool Only:
$ ssh admin@10.0.11.1
$ apt-mark unhold kubeadm && \
$ apt-get update && apt-get install -y kubeadm=1.13.0-00 && apt-mark hold kubeadm
The reason why we run apt-mark unhold and apt-mark hold is because if we upgrade kubeadm then the installation will automatically upgrade the other components like kubelet to the latest version (which is v1.15) by default, so we would have a problem. To fix that, we use hold to mark a package as held back, which will prevent the package from being automatically installed, upgraded, or removed.
2. Verify the Upgrade Plan:
xxxxxxxxxx
$ kubeadm upgrade plan
...
COMPONENT CURRENT AVAILABLE
API Server v1.13.0 v1.14.0
Controller Manager v1.13.0 v1.14.0
Scheduler v1.13.0 v1.14.0
Kube Proxy v1.13.0 v1.14.0
...
3. Apply the Upgrade Plan:
xxxxxxxxxx
$ kubeadm upgrade plan apply v1.14.0
4. Update Kubelet and Restart the Service:
xxxxxxxxxx
$ apt-mark unhold kubelet && apt-get update && apt-get install -y kubelet=1.14.0-00 && apt-mark hold kubelet
$ systemctl restart kubelet
5. Apply the Upgrade Plan to the Other Master Nodes:
xxxxxxxxxx
$ ssh admin@10.0.11.2
$ kubeadm upgrade node experimental-control-plane
$ ssh admin@10.0.11.3
$ kubeadm upgrade node experimental-control-plane
6. Upgrade kubectl on all Master Nodes:
xxxxxxxxxx
$ apt-mark unhold kubectl && apt-get update && apt-get install -y kubectl=1.14.0-00 && apt-mark hold kubectl
7. Upgrade kubeadm on First Worker Node:
xxxxxxxxxx
$ ssh worker@10.0.12.1
$ apt-mark unhold kubeadm && apt-get update && apt-get install -y kubeadm=1.14.0-00 && apt-mark hold kubeadm
8. Login to a Master Node and Drain First Worker Node:
xxxxxxxxxx
$ ssh admin@10.0.11.1
$ kubectl drain worker --ignore-daemonsets
9. Upgrade kubelet Config on Worker Node:
xxxxxxxxxx
$ ssh worker@10.0.12.1
$ kubeadm upgrade node config --kubelet-version v1.14.0
10. Upgrade kubelet on Worker Node and Restart the Service:
xxxxxxxxxx
$ apt-mark unhold kubelet && apt-get update && apt-get install -y kubelet=1.14.0-00 && apt-mark hold kubelet
$ systemctl restart kubelet
11. Restore Worker Node:
xxxxxxxxxx
$ ssh admin@10.0.11.1
$ kubectl uncordon worker
Step 12: Repeat steps 7-11 for the rest of the worker nodes.
Step 13: Verify the health of the cluster:
$ kubectl get nodes
Etcd Upgrade Paths
As you already know, etcd is the highly distributed key-value backing store for Kubernetes, and it’s essentially the point of truth. When we are running an HA Kubernetes cluster, we also want to run an HA etcd cluster because we want to have a fallback just in case some nodes fail.
Typically, we would have a minimum of 3 etcd nodes running with the latest supported version. The process of upgrading the etcd nodes is documented in the etcd repo. These are the current paths:
- Upgrade from 2.3 to 3.0
- Upgrade from 3.0 to 3.1
- Upgrade from 3.1 to 3.2
- Upgrade from 3.2 to 3.3
- Upgrade from 3.3 to 3.4
- Upgrade from 3.4 to 3.5
When planning for etcd upgrades, you should always follow this plan:
- Check which version you are using. For example:
Shell
xxxxxxxxxx
1
1$ ./etcdctl endpoint status
- Do not jump more than one minor version. For example, do not upgrade from 3.3 to 3.5. Instead, go from 3.3 to 3.4, and then from 3.4 to 3.5.
- Use the bundled Kubernetes etcd image. The Kubernetes team bundles a custom etcd image located here which contains etcd and etcdctl binaries for multiple etcd versions as well as a migration operator utility for upgrading and downgrading etcd. This will help you automate the process of migrating and upgrading etcd instances.
Out of those paths, the most important change is the path from 2.3 to 3.0, as there is a major API change which is documented here. You should also take note that:
- Etcd v3 is able to handle requests for both the v2 and v3 data. For example, we can use the ETCDCTL_APIenv variable to specify the API version:
Shell
xxxxxxxxxx
1
1$ ETCDCTL_API=2 ./etcdctl endpoint status
2
- Etcd v3 is able to handle requests for both the v2 and v3 data. For example, we can use the ETCDCTL_APIenv variable to specify the API version:
- Running etcd v3 against the v2 data dir doesn’t automatically upgrade the data dir to the v3 format.
- Using v2 api against etcd v3 only updates the v2 data stored in etcd.
You may also wonder which versions of Kubernetes have support for each etcd version. There is a small section in the documentation which says:
- Kubernetes v1.0: supports etcd2 only
- Kubernetes v1.5.1: etcd3 support added, new clusters still default to etcd
- Kubernetes v1.6.0: new clusters created with kube-up.sh default to etcd3, and kube-apiserver defaults to etcd3
- Kubernetes v1.9.0: deprecation of etcd2 storage backend announced
- Kubernetes v1.13.0: etcd2 storage backend removed, kube-apiserver will refuse to start with –storage-backend=etcd2, with the message etcd2 is no longer a supported storage backend
So, based on that information, if you are running Kubernetes v1.12.0 with etcd2, then you are required to upgrade etcd to v3 when you upgrade Kubernetes to v1.13.0 as –storage-backend=etcd3 is not supported. If you have Kubernetes v1.12.0 and below, you can have both etcd2 and etcd3 running.
Before every step, we should always perform basic maintenance procedures such as periodic snapshots and periodic smoke rollbacks. Make sure to check the health of the cluster:
Let’s say we have the following etcd cluster nodes:
Name | Address | Hostname |
etcd-1 | 10.0.11.1 | etcd-1.example.com |
etcd-2 | 10.0.11.2 | etcd-2.example.com |
etcd-3 | 10.0.11.3 | etcd-3.example.com |
xxxxxxxxxx
$ ./etcdctl cluster-health
member 6e3bd23ae5f1eae2 is healthy: got healthy result from http://10.0.1.1:22379
member 924e2e83f93f2565 is healthy: got healthy result from http://10.0.1.2:22379
member 8211f1d0a64f3269 is healthy: got healthy result from http://10.0.1.3:22379
cluster is healthy
Upgrading etcd
Based on the above considerations, a typical upgrade etcd procedure consists of the following steps:
1. Login to the First Node and Stop the Existing etcd Process:
xxxxxxxxxx
$ ssh 10.0.1.1
$ kill `pgrep etcd`
2. Backup the etcd Data Directory to Provide a Downgrade Path in Case of Errors:
xxxxxxxxxx
$ ./etcdctl backup \
--data-dir %data_dir% \
[--wal-dir %wal_dir%] \
--backup-dir %backup_data_dir%
[--backup-wal-dir %backup_wal_dir%]
3. Download the New Binary Taken From etcd Releases Page and Start the etcd Server Using the Same Configuration:
xxxxxxxxxx
ETCD_VER=v3.3.15
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /usr/local/etcd && mkdir -p /usr/local/etcd
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /usr/local/etcd --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
/usr/local/etcd/etcd --version
ETCDCTL_API=3 /usr/local/etcd/etcdctl version
# start etcd server
/usr/local/etcd/etcd -name etcd-1 -listen-peer-urls http://10.0.1.1:2380 -listen-client-urls http://10.0.1.1:2379,http://127.0.0.1:2379 -advertise-client-urls http://10.0.1.1:2379,http://127.0.0.1:2379
4. Repeat Step 1 to Step 3 for all Other Members.
5. Verify That the Cluster Is Healthy:
xxxxxxxxxx
$ ./etcdctl endpoint health
10.0.1.1:12379 is healthy: successfully committed proposal: took =
10.0.1.2:12379 is healthy: successfully committed proposal: took =
10.0.1.3:12379 is healthy: successfully committed proposal: took =
Note: If you are having issues connecting to the cluster, you may need to provide HTTPS transport security certificates; for example:
xxxxxxxxxx
$ ./etcdctl --ca-file=/etc/kubernetes/pki/etcd/ca.crt --cert-file=/etc/kubernetes/pki/etcd/server.crt --key-file=/etc/kubernetes/pki/etcd/server.key endpoint health
For convenience, you can use the following environmental variables:
xxxxxxxxxx
ETCD_CA_FILE=/etc/kubernetes/pki/etcd/ca.crt
ETCD_CERT_FILE=/etc/kubernetes/pki/etcd/server.crt
ETCD_KEY_FILE=/etc/kubernetes/pki/etcd/server.key
Final Thoughts
In this article, we showed step-by-step instructions on how to upgrade both Kubernetes and Etcd clusters. These are important maintenance procedures and eventualities for the day-to-day operations in a typical business environment. All participants who work with HA Kubernetes deployments should become familiar with the previous steps.
Published at DZone with permission of Theofanis Despoudis, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments