Spurious upgrade-charm hooks during initial deploy

The Charmed Kubernetes CI recently has been encountering an issue where an upgrade-charm hook is triggered during the initial deploy for no apparent reason. For example:

13:27:47 [validate-ck-bugfix-upgrade-focal-1.17-stable] + juju deploy -m validate-b792948a:validate-upgrade --overlay overlay.yaml --force --channel stable 'cs:~containers/charmed-kubernetes'
13:27:48 [validate-ck-bugfix-upgrade-focal-1.17-stable] Located bundle "cs:~containers/bundle/charmed-kubernetes-432"
13:27:48 [validate-ck-bugfix-upgrade-focal-1.17-stable] Resolving charm: cs:~containers/containerd-64
13:27:48 [validate-ck-bugfix-upgrade-focal-1.17-stable] Resolving charm: cs:~containers/easyrsa-303
13:27:48 [validate-ck-bugfix-upgrade-focal-1.17-stable] Resolving charm: cs:~containers/etcd-501
13:27:48 [validate-ck-bugfix-upgrade-focal-1.17-stable] Resolving charm: cs:~containers/flannel-477
13:27:48 [validate-ck-bugfix-upgrade-focal-1.17-stable] Resolving charm: cs:~containers/kubeapi-load-balancer-715
13:27:48 [validate-ck-bugfix-upgrade-focal-1.17-stable] Resolving charm: cs:~containers/kubernetes-master-826
13:27:49 [validate-ck-bugfix-upgrade-focal-1.17-stable] Resolving charm: cs:~containers/kubernetes-worker-661
...
13:28:13 [validate-ck-bugfix-upgrade-focal-1.17-stable] Deploy of bundle completed.
13:28:13 [validate-ck-bugfix-upgrade-focal-1.17-stable] + juju::wait
13:28:13 [validate-ck-bugfix-upgrade-focal-1.17-stable] + echo 'Waiting for deployment to settle...'
13:28:13 [validate-ck-bugfix-upgrade-focal-1.17-stable] Waiting for deployment to settle...
13:28:13 [validate-ck-bugfix-upgrade-focal-1.17-stable] + timeout 45m juju-wait -e validate-b792948a:validate-upgrade -w
13:32:54 [validate-ck-bugfix-upgrade-focal-1.17-stable] ERROR:root:kubernetes-worker/0 failed: workload status is error

This resulted in:

$ grep -E '(ran|queued) .* hook|operation upgrade' kubernetes-worker_0/var/log/juju/unit-kubernetes-worker-0.log 
2020-06-05 17:29:49 INFO juju.worker.uniter resolver.go:139 found queued "install" hook
2020-06-05 17:31:14 INFO juju.worker.uniter.operation runhook.go:142 ran "install" hook (via explicit, bespoke hook script)
2020-06-05 17:31:15 INFO juju.worker.uniter resolver.go:139 found queued "leader-elected" hook
2020-06-05 17:31:17 INFO juju.worker.uniter.operation runhook.go:142 ran "leader-elected" hook (via explicit, bespoke hook script)
2020-06-05 17:31:20 INFO juju.worker.uniter.operation runhook.go:142 ran "config-changed" hook (via explicit, bespoke hook script)
2020-06-05 17:31:20 INFO juju.worker.uniter resolver.go:139 found queued "start" hook
2020-06-05 17:31:51 INFO juju.worker.uniter.operation runhook.go:142 ran "start" hook (via explicit, bespoke hook script)
2020-06-05 17:31:52 DEBUG juju.worker.metrics.collect manifold.go:278 ran "collect-metrics" hook (via explicit, bespoke hook script)
2020-06-05 17:32:29 INFO juju.worker.uniter.operation runhook.go:142 ran "cni-relation-joined" hook (via explicit, bespoke hook script)
2020-06-05 17:32:32 INFO juju.worker.uniter.operation runhook.go:142 ran "kube-api-endpoint-relation-joined" hook (via explicit, bespoke hook script)
2020-06-05 17:32:44 INFO juju.worker.uniter.operation runhook.go:142 ran "container-runtime-relation-joined" hook (via explicit, bespoke hook script)
2020-06-05 17:32:44 DEBUG juju.worker.uniter.operation executor.go:83 running operation upgrade to cs:~containers/kubernetes-worker-661
2020-06-05 17:32:45 INFO juju.worker.uniter resolver.go:139 found queued "upgrade-charm" hook

In this case, the upgrade-charm hook failed due to an issue that we were specifically testing the fix for, because it obviously didn’t have the new code from an actual upgrade, which is how we noticed this.

Why is upgrade-charm running during the initial deploy? This was a regular bundle deployment

So, the upgrade-charm hook runs if there’s a change to the charm itself. The application entity in the model has a “charm modified version” attribute. The logs show the upgrade-charm hook running because this attribute has changed on the applications in question, eg kubernetes-worker, calico etc.

How does the “charm modified version” attribute get incremented?

  • a new charm URL is set against the application
  • a charm resource is updated

Updating a charm resource needs to run the update-charm hook to give the charm a chance to deal with that change.

The bundle in question has a number of charms with resources, and the way the bundle is processed by Juju, the application is added to the model and then the resources are configured. This resource set up increments the “charm modified version” attribute which triggers the charm upgrade.

I’ve raised a bug to track this Bug #1883036 “bundles don't apply applications atomically” : Bugs : juju