Controller machine giving "connection refused"

I restarted my controller, and now it is giving “connection refused” on port 17070, and so I basically can’t do anything with it any more. This hangs:

laney@raleigh> juju --debug status
09:31:04 INFO  juju.cmd supercommand.go:91 running juju [2.8.0 0 d816abe62fbf6787974e5c4e140818ca08586e44 gc go1.14.4]
09:31:04 DEBUG juju.cmd supercommand.go:92   args: []string{"/snap/juju/12370/bin/juju", "--debug", "status"}
09:31:04 INFO  juju.juju api.go:67 connecting to API addresses: [[2001:8b0:df29:5:216:3eff:fe1a:b851]:17070 192.168.1.42:17070 [fd00::1:216:3eff:fe1a:b851]:17070]

(those IPs are right)

I can break into it with lxc shell, and it looks like the services are running OK:

             ├─jujud-machine-0.service
             │ ├─60093 bash /etc/systemd/system/jujud-machine-0-exec-start.sh
             │ └─60097 /var/lib/juju/tools/machine-0/jujud machine --data-dir /var/lib/juju --machine-id 0 --debug
             ├─juju-db.service
             │ └─60230 /usr/bin/mongod --auth --bind_ip_all --dbpath /var/lib/juju/db --ipv6 --journal --keyFile /var/lib/juju/shared-se...

But indeed I can’t telnet localhost 17070.

machine-0.log is full of this:

2020-06-16 08:46:37 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [fb4aad] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused

but mongo itself seems to work:

connecting to: mongodb://localhost:37017/juju
2020-06-16T08:48:42.083+0000 W NETWORK  [thread1] SSL peer certificate validation failed: unable to get local issuer certificate
MongoDB server version: 3.6.3
Server has startup warnings: 
2020-06-16T08:41:27.948+0000 I STORAGE  [initandlisten] 
2020-06-16T08:41:27.948+0000 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2020-06-16T08:41:27.948+0000 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2020-06-16T08:41:28.854+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2020-06-16T08:41:28.854+0000 I CONTROL  [initandlisten] 
juju:OTHER> 

Can someone help me get this back up and running please? :pray:

root@juju-568192-0:/var/log/juju# /var/lib/juju/tools/machine-0/jujud --version
2.7.6-bionic-amd64
laney@raleigh> juju version
2.8.0-focal-amd64

This looks like a LXD cloud; is that right?

If it is, check that network where the LXD server is running is not firewalled off from the controller machine.

Thanks for the reply!

Yes, it is. The controller’s in another container on the same machine. There aren’t any firewalls in play, e.g. when I break into the controller I can (attempt to) SSH to other instances

root@juju-568192-0:~# ssh 192.168.1.109
The authenticity of host '192.168.1.109 (192.168.1.109)' can't be established.
ECDSA key fingerprint is SHA256:Sl76mb2LWYn4DawxZiNu9pa6URbWngi/AP3jie2rZUs.
Are you sure you want to continue connecting (yes/no)? ^C

or I could SSH to the controller from my workstation

laney@raleigh> ssh 192.168.1.42
The authenticity of host '192.168.1.42 (192.168.1.42)' can't be established.
ECDSA key fingerprint is SHA256:Q1nBtbM9H7DTZN2wEQ6SkWuIyeULx6xjCg5E4fHuk5I.
No matching host key fingerprint found in DNS.
Are you sure you want to continue connecting (yes/no/[fingerprint])? ^C

Look at the machine log for the controller. It should be something like /var/log/juju/machine-0.log.

The agents are probably unable to connect to the API because there’s an error preventing it from coming up.

Execute “lxc info” where the LXD server is running. It should have a section like this:

environment:
  addresses:
  - 192.168.1.101:8443
  - 10.30.30.1:8443
...

The controller needs to be able to access the server at one of those addresses.

That sounds right. That log file is full of

2020-06-21 19:38:46 ERROR juju.worker.dependency engine.go:671 "state" manifold worker returned unexpected error: no reachable servers
2020-06-21 19:39:44 ERROR juju.worker.dependency engine.go:671 "api-caller" manifold worker returned unexpected error: [fb4aad] "machine-0" cannot open api: unable to connect to API: dial tcp 127.0.0.1:17070: connect: connection refused

over and over. Perhaps that indicates what you’re saying?

Right. There are a lot of IPs there, some good and some bad. This:

  - 192.168.1.136:8443

is one such, and:

root@juju-568192-0:/var/log/juju# telnet 192.168.1.136 8443
Trying 192.168.1.136...
Connected to 192.168.1.136.
Escape character is '^]'.

it seems to work OK from the controller’s container.

I think you should file a bug for this and supply the log from the controller machine, either as an attachment or a paste-bin if there is no sensitive information there.