Juju hangs on deploy, when 2 weeks ago was working fine

I am working on a custom charm (my first) and last 2 weeks had paused while I wipe my installation and repartition… but as far as juju was concerned everything was stable and running fine with an LXD based controller on this one machine.

this morning, I have installed latest everything including juju 2.8… but find my charm has stopped working completely. It at first got a machine up that was waiting for agents… which I then deleted and retried again… only on the subsequent try nothing is coming up in juju status… and it just hangs on the deploy command forever

any ideas how to debug ? I think it could be my partition setup … as I am using an NVME drive via a PCIE adapter… could that be causing instability in linux as a whole ?

just hangs here

$ juju deploy . 
WARNING making "hooks/ibsocket-relation-joined" executable in charm
Deploying charm "local:bionic/ibgateway-3".


status gives

Every 2,0s: juju status                                                                                                         corei7: Thu Jun  4 10:57:49 2020

Model    Controller           Cloud/Region         Version  SLA          Timestamp
default  localhost-localhost  localhost/localhost  2.8.0    unsupported  10:57:50+02:00

Model "admin/default" is empty.

Edit:

It seems the system itself is unstable… I try to uninstall the snap… it failed, then I try to install 2.7 juju… and get errors as well

$ sudo snap install juju --channel=2.7/stable --classic
snap "juju" is already installed, see 'snap help refresh'

$ sudo snap purge juju
error: unknown command "purge", see 'snap help'.

$ sudo snap remove juju
error: cannot perform the following tasks:
- Stop snap "juju" services ([--root / is-enabled snap.juju.fetch-oci.service] failed with exit status 1: Failed to get unit file state for snap.juju.fetch-oci.service: No such file or directory
)

It sounds like there might be some underlying system stability. I’d check several things:

  1. Check the output of journalctl -xe to see if your system is having errors in general.

  2. Verify that you can spin up and connect to lxd containers.

lxc launch ubuntu:20.04 test
lxc shell test

(You can also look for juju machines in the output of lxc list, and shell into them to check them for errors.)

  1. Are there interesting messages in your juju output when you deploy w/ --debug?

  2. Can you install and uninstall other snaps?

sudo snap install robotfindskitten
sudo snap remove --purge robotfindskitten

That’s where I’d start troubleshooting. Let us know if you have any questions based on that start!

1 Like

indeed it was the system. Seems it is not stable when utilizing an NVME drive over a PCIE adapter… so I backed off, installed on a SATA3 drive, and voila everything is going 100%.

Thank you for the tips, will use these next time

1 Like