Photo by Joel Thorner on Unsplash
In a previous blog post we’ve shown you how to deploy EKS quickly and easily with Terraform. AWS recently release version v1.18 of Kubernetes on EKS so now is the perfect opportunity to see how to upgrade an EKS cluster using Terraform.
For the rest of this blog it is assumed that you’ve used https://github.com/cweibel/example-terraform-eks/tree/main/eks_and_fargate to spin up your EKS cluster and a single managed Node Group.
Performing Upgrades
Upgrades can be done through either the AWS Console UI or via Terraform. We’ll assume that you want to continue to use Terraform to manage EKS after you’ve bootstrapped the environment.
Let’s follow the mama duck!
Upgrades are done in two or more phases. The first phase involves updating the version of Kubernetes on the master/control node. The subsequent phases involve updating the one or more Node Groups you have defined.
Step 1 - Upgrade the master
This is straight forward. In this repo set cluster.tf local variables to the desired version:
locals {
cluster_version = "1.18" # Assuming you initially deployed 1.17
}
Perform a terraform apply
and perform an update in-place:
Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# module.eks.aws_eks_cluster.this[0] will be updated in-place
~ resource "aws_eks_cluster" "this" {
arn = "arn:aws:eks:us-west-2:123456789012:cluster/eks-cweibel2"
certificate_authority = [
{
data = "LS0tLREDACTEDtLS0tLQo="
},
]
created_at = "2020-09-16 16:07:38.259 +0000 UTC"
enabled_cluster_log_types = []
endpoint = "https://1234AB1AEB234567FA5EBBAA67ED8BC9.gr7.us-west-2.eks.amazonaws.com"
id = "eks-cweibel2"
identity = [
{
oidc = [
{
issuer = "https://oidc.eks.us-west-2.amazonaws.com/id/1234AB1AEB234567FA5EBBAA67ED8BC9"
},
]
},
]
name = "eks-cweibel2"
platform_version = "eks.3"
role_arn = "arn:aws:iam::123456789012:role/eks-cweibel220200916160718353000000001"
status = "ACTIVE"
tags = {}
~ version = "1.17" -> "1.18"
timeouts {
create = "30m"
delete = "15m"
}
vpc_config {
cluster_security_group_id = "sg-081a41a4f850bc69b"
endpoint_private_access = false
endpoint_public_access = true
public_access_cidrs = [
"0.0.0.0/0",
]
security_group_ids = [
"sg-084cb02c3a6d6442c",
]
subnet_ids = [
"subnet-05fa7d7ec68ecfae8",
"subnet-07c8750e73172d61d",
"subnet-0c9b19e2856e8b867",
]
vpc_id = "vpc-05f0ba84696234a43"
}
}
Plan: 0 to add, 1 to change, 0 to destroy.
This will take a few minutes to run, don’t jump ahead until this step finishes.
Step 2a - Upgrading Node Groups
Once the master node has been upgraded to the newer version, each of the Node Groups can be upgraded, following the mama duck. This is done by tainting the NodeGroup resources:
terraform taint "module.eks.module.node_groups.random_pet.node_groups[\"eks_nodes\"]"
terraform taint "module.eks.module.node_groups.aws_eks_node_group.workers[\"eks_nodes\"]"
This will not do an in-place upgrade. What it will do is:
- Spin an entirely new NodeGroup set of EC2 instances using the newer AMI.
- Once all the new worker nodes are healthy, the older nodes will be drained and status will be
Ready,SchedulingDisabled
. - After the pods are moved off of the old workers, the underlying EC2 instances are terminated and the upgrade is complete
While this is happening, to get a “BOSH like experience to know the state of the upgrade” you can run a watch kubectl get nodes
command. In the example below the new workers have been successfully added and the old workers are in the process of draining:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-20-68-201.us-west-2.compute.internal Ready <none> 17m v1.18.1-eks-4c6976
ip-10-20-71-150.us-west-2.compute.internal Ready,SchedulingDisabled <none> 13m v1.17.3-eks-2ba888
ip-10-20-76-255.us-west-2.compute.internal Ready <none> 17m v1.18.1-eks-4c6976
ip-10-20-83-114.us-west-2.compute.internal Ready <none> 17m v1.18.1-eks-4c6976
ip-10-20-73-121.us-west-2.compute.internal Ready,SchedulingDisabled <none> 13m v1.17.3-eks-2ba888
ip-10-20-75-133.us-west-2.compute.internal Ready,SchedulingDisabled <none> 13m v1.17.3-eks-2ba888
If you have additional Node Groups (managed or unmanaged) you can now loop through and taint the worker groups. To help prevent bumping in AWS EC2 resource quotas you’ll likely only want to perform upgrades to one worker group at a time.
Step 2b - Upgrading Fargate
Once the master
node has been upgraded to the newer version, any newly created fargate pods will deploy EC2 instances with AMIs based on the version the master has.
To upgrade existing Fargate pods, there is no terraform
to do this. The pods needs to be destroyed and recreated. To do this without downtime for the app itself, Kubernetes Deployments should be leveraged which can do rolling upgrades.
For example, if there is a Deployment called nginx
which has 3 replicas:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-f8d6d5c66-62s4f 1/1 Running 0 51m
nginx-deployment-f8d6d5c66-7tmc9 1/1 Running 0 52m
nginx-deployment-f8d6d5c66-qlkf5 1/1 Running 0 50m
You can see the 3 Fargate nodes that are associated to these 3 pods:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fargate-ip-10-20-79-20.us-west-2.compute.internal Ready <none> 49m v1.17.8-eks-e16311
fargate-ip-10-20-80-167.us-west-2.compute.internal Ready <none> 50m v1.17.8-eks-e16311
fargate-ip-10-20-83-190.us-west-2.compute.internal Ready <none> 51m v1.17.8-eks-e16311
ip-10-20-67-247.us-west-2.compute.internal Ready <none> 84m v1.17.3-eks-2ba888
ip-10-20-77-210.us-west-2.compute.internal Ready <none> 85m v1.17.3-eks-2ba888
ip-10-20-86-121.us-west-2.compute.internal Ready <none> 85m v1.17.3-eks-2ba888
Perform upgrade of master
from 1.17 to 1.18, any new fargate pods created will use the 1.18 AMI. To rotate the existing Fargate Pods, perform a rolling update:
$ kubectl -n fgnamespace rollout restart deployment nginx-deployment
deployment.apps/nginx-deployment restarted
To see the status of the rollout:
kubectl rollout status deployments/nginx-deployment
Waiting for deployment "nginx-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "nginx-deployment" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "nginx-deployment" rollout to finish: 1 old replicas are pending termination...
deployment "nginx-deployment" successfully rolled out
If you look at the list of nodes at this point, the Fargate Pods are now all leveraging the v1.18.1 AMI
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
fargate-ip-10-20-69-132.us-west-2.compute.internal Ready <none> 2m56s v1.18.1-eks-a84824
fargate-ip-10-20-69-55.us-west-2.compute.internal Ready <none> 4m14s v1.18.1-eks-a84824
fargate-ip-10-20-86-53.us-west-2.compute.internal Ready <none> 97s v1.18.1-eks-a84824
ip-10-20-67-247.us-west-2.compute.internal Ready <none> 90m v1.17.3-eks-2ba888
ip-10-20-77-210.us-west-2.compute.internal Ready <none> 91m v1.17.3-eks-2ba888
ip-10-20-86-121.us-west-2.compute.internal Ready <none> 91m v1.17.3-eks-2ba888
Opinion: Import Notes for Fargate
Fargate is an odd duck when it comes to an Ops team being responsible for performing upgrades. Unless the Fargate Pods are recreated via rolling upgrades there is no way for Ops personnel to do this work without looping through the namespaces in the Fargate Profile and performing rolling upgrades of the deployment. If pods aren’t associated to a Kubernetes Deployment, such as an app that only has a pod spec, tooling needs to be created to orchestrate the restart.
Otherwise the responsibility of performing upgrades to Fargate Pods falls to the app owner. This could be dangerous from a compliance/security perspective.
What’s Next?
Need a dash of Cloud Foundry deployed on top of your Terraform managed EKS cluster? Checkout Deploying KubeCF to EKS, much of that blog is automated into https://github.com/cweibel/example-terraform-eks/tree/main/eks_for_kubecf_v2. If you have any questions, please feel free to ask in the comments section below.
Enjoy!