Coordinating "actions" for a K8s operators

jameinel · 1 June 2020 17:33

@jk0ne has raised an interesting question for doing operations of a k8s application (issue 292).

The issue in this particular case is a discourse application, which may want to run a database migration script because of a configuration change. (You just got a request to run a new version of your application, and you need to migrate the database to the new schema.)

The issue is that you need to coordinate that only 1 unit actually runs the migration script, but it needs to be run inside the application pod (where the schema is defined).

If this was a normal IAAS charm, then you can use “is-leader” to determine which of the units runs the script, and then run the script directly. However, with CAAS the pod that can run “is-leader” is decoupled from the pod that can run the script.

There are a few potential ways to attack this, and it would be good to know what we recommend.

Talk directly to the K8s api from the operator pod, to configure a Job that represents the migration script. It can potentially use the same container image, with a different environment variable to indicate how it is meant to operate. Note that the charm still needs a way to bring down the existing application pods, since they won’t be able to talk the new db schema, and doesn’t really want to start the new application pods until the schema has been migrated.
Be able to ‘kubectl exec’ a script to run inside an application pod. In this the operator could probably configure the new pods to run, but also they would go into “suspended” state, until the charm sees that the leader pod is ready, then triggers the script to run the db migration. Once that completes, it then triggers a script on all the pods so that they actually start the application.
This could also potentially be explicit charm actions. (eg, you don’t just exec something in the application, but have pre-defined scripts that you can cause to be run from the charm.) IIRC, charms don’t have a way to trigger actions, they can currently only be initiated by ‘juju run’ from a user.
From the application pods, be able to “juju-run is-leader” to know which pod should be running the migration script. You then need a way to communicate that the migration has been done and that the pods can resume normal operation. This one feels a bit clumsier, because it likely means that you bring up all 3 application pods but with a “don’t actually run” flag set, and then run the migration on the leader, which then has to communicate back out when the migration has been done, which then causes the charm to change the pod spec to now say “and actually run the application as normal”. It is certainly doable, but it does need a way for the Charm to be able to indicate to a pod that it is the ‘special’ one (would this be possible with an env var that says ‘X is the special one’ and some way for pods to tell if they are X?)
Have the charm running inside the application pod so it has direct access to the migration script.

Are there other ways to solve this problem? Are there good answers for it that are already available in Juju today? (I feel like talking to the k8s api might already be possible, but it is probably a bit clumsy to enable.)

John

wallyworld · 2 June 2020 04:35

juju-run is available on the operator pod and you can run a script on the workload pod, eg
juju-run myunit/0 hostname

The script can be an action, eg
juju-run myunit/0 actions/someaction

However, juju-run cannot itself be invoked from within a hook or action. So if you want to run an action on the workload from inside a config-changed hook for example, that’s not available. This limitation is not k8s specific. Perhaps the hook could spawn a new process without the JUJU_CONTEXT_ID env var set and invoke juju-run as above (JUJU_UNIT_NAME would be used for the arg to juju-run). I haven’t tried it but I hope/expect it would work.

To confirm also, the charm hooks do have access to the k8s API (/usr/local/lib/python3.6/dist-packages/kubernetes) and there’s also an example in the github issue above.

jk0ne · 22 June 2020 17:16

It’s unclear to me. Does the operator pod get the correct RBAC rules / setup by default, or is that something that needs to be done manually?

wallyworld · 28 June 2020 22:39

The operator pod gets the roles it needs set up by default.
If the workload itself needs roles defined, the charm can do this via the k8s spec yaml/json.