Juju run takes a long time

I have a small model of about 150 nodes. In an HPC context, this is small.

I need to perform some admin tasks for my test, so I thought to use “juju run” for this.

So I did:

$ time juju run --application hpc --timeout=10m0s 'sudo mkdir -p /scratch; sudo chmod 1777 /scratch'
- Message: action terminated
  UnitId: hpc/49
- Stdout: ""
  UnitId: hpc/98
- Stdout: ""
  UnitId: hpc/99
- Message: action terminated
  UnitId: hpc/54
- Message: action terminated
  UnitId: hpc/55
- Message: action terminated
  UnitId: hpc/86
- Message: action terminated
  UnitId: hpc/59
- Message: action terminated
  UnitId: hpc/89
- Message: action terminated
  UnitId: hpc/74

ERROR timed out waiting for result from: unit hpc/11

real	10m14.241s
user	0m0.959s
sys	0m0.347s

This not only takes a long time (10min before timeout)" and also times out for some units and I’m not able to determine easily the success of these commands.

I’m curious about your thoughts on how juju will be able to handle a larger environment of a few thousands servers and perhaps some 10000 units.

I’m not sure yet exactly how juju executes this, in serial, or, in parallell and it would be good to get some idea as how the progress of running on multiple targets progress. The current situation gives no indication of how many of these commands has completed, how many are executing, how many are waiting etc.

I’m using juju 2.8.1

1 Like

It certainly looks like there’s lots of opportunity to improve. Each agent should be able to execute that command completely in parallel.

How long does something like time juju run --application hpc 'hostname' take? Is it possible that the filesystem underneath the units is the bottleneck here? I’ve seen problems with distributed file systems before when their metadata servers become overloaded. I doubt that’s a problem here though, but it may be worthwhile to eliminate.

1 Like

It’s a local file system on each server so this command would return in a subsecond.

I think the command is waiting perhaps on an offline agent or similar which would be normal in a large cluster… But not sure.

Something to keep in mind is that only one Juju agent daemon (whether an application jujud-unit-myapp-X or a machine jujud-machine-X) on any given juju machine/container can be running a hook at any one time. Juju run commands are considered as a hook and will await the machine lock before executing, even if it’s such a simple command as this.

One thing I can recommend to determine if this is the issue is connecting to the units that are not returning and running “juju_machine_lock” from the command line to determine if there’s a long-running hook that is holding the machine lock hostage and then investigating what that unit/hook holding the lock is stuck on.

If you know the command should run in ~30 seconds or less, you could definitely shorten your juju run timeout to something like ‘–timeout 60s’ in your juju run arguments. This will let those units that are not held captive by machine locks or dead machines to timeout sooner than the 10 minute auto-timeout.

The point about knowing which have succeeded or failed will just require an after-run audit command to determine if the actions were successful by taking inventory of the results.

The juju unit logs will tell you if your juju run command was ever attempted as you try to determine if this is juju agents not being responsive or if it is indeed an i/o blocking issue.

Hope this helps,
-Drew

2 Likes

Also, as part of juju run being treated as a hook, it gets piled on the queue for the agent to execute, which, if you run it early enough in the deployment of your model, could get stuck waiting for the start hook, the install hook, the config-changed hook, and any number of relation-joined/relation-changed hooks. I’d suggest if you need to run something during deployment that it either get coded into the charm requiring it, or wait for the model to settle, run the command, then deploy any other parts of the bundle that are dependent upon that juju run after it settles.

1 Like

@afreiberger thanx for the advices.

I would however be much helped by having some way to determine on a “per-unit” status while the command executes. This would leave me alot less in a dark state which only can be assessed by a debug session which will be very difficult in models with 1000+ nodes.

I have experience from “Rocks Clusters” which executes commands in parallell and outputs results which will easily let me know what nodes has completed the command in time and which has left to complete or somehow fails. This lets me have a workflow that makes me know immediately which nodes is in a state where a “re-run” or “fix” needs to be applied to the nodes which has not completed the execution.

This is absolutely a killer feature if it was reliable and fast on many, many unites/nodes.

Here is how rocks does it: https://cheatography.com/brie/cheat-sheets/rocks-cluster-commands/

… and more docs on that: https://www.rocksclusters.org/assets/usersguides/roll-documentation/base/6.2/x6396.html

You may be interested in the juju commands “show-action-status” and “show-action-output” to be able to identify the status of each unit’s run.

1 Like

I did some looping, sed and awk which feels primitive and wrong but I can’t find a better way atm.

juju status… sed, awk, grep

for i in $(cat machines.txt); do juju ssh $i hostname; done

I’ll see if I can find a better way later on. Thanx for helping out.