How to reboot or replace machines in kubernetes cluster

brightlee6 · 27 May 2020 16:18

I setup a kubernetes cluster in aws using juju following this link:

Everything works great. Now I got two issues to solve:

How to reboot these machines?
When I “juju ssh” to these machines, I saw "*** System restart required ".
Can I just run “sudo reboot now” to reboot these machines?
This seems dangerous to me since it may crash the cluster.
For example, how to reboot these machines:
easyrsa/0
etcd/0
kubernetes-master/0
kubernetes-master/1
How to replace some of these machines?
After launching a few applications in k8s, I need to replace some of these machines with bigger ec2 types. How to achieve this?
For example, how to upgrade these machines with bigger ec2 types:
easyrsa/0*
etcd/0*
kubernetes-master/0
kubernetes-master/1*
Any good books on juju?
I think juju is a good tool to manage k8s and other clusters, and online docs are very good to get started. Are there any good books on juju with more details to manage clusters?

Thanks.

brightlee6 · 27 May 2020 17:55

Here are the details about the k8s cluster launched from juju in aws for your reference:

How can I reboot machine 0 for easyrsa/0?

How can replace machine 0 with a new ec2 instance with different ec2 type?

Thanks.

timClicks · 27 May 2020 21:53

Does anyone in the @k8s-charmers group have any suggestions here?

They seem like great questions to answer for anyone deploying Charmed Kubernetes

cory_fu · 27 May 2020 22:25

Upgrading | Ubuntu has info on how to upgrade a cluster. While it’s mainly focused on upgrading the charms and components, the same approach would apply to replacing the units with larger instances or even just rebooting the machines, just skipping the bits that aren’t relevant, like calling upgrade-charm or running the upgrade actions.

The short answer is that rebooting or replacing individual units is generally fine as long as you do them one at a time, though you’ll want to use the pause action on each worker before you reboot or replace it to ensure that all workloads are drained off onto another worker.

cory_fu · 27 May 2020 22:29

Oh, I should probably note that the process for replacing a unit would be, roughly:

$ juju run-action --wait <unit-name> pause  # if a worker
$ juju remove-unit <unit-name>
$ juju add-unit <app-name> --constraints ...