OK I just completely rebuilt the system after this, wiped everything and boostrapped a controller on fresh, empty hardware.
4 days on from that, with only one deployment of charmed kubernetes in a pretty standard way… and good news: the controller’s disk didn’t fill up.
However, the controller is now virtually unresponsive. Not much activity on the processor (using
htop), but it struggles to do anything - no Juju GUI, no response to the CLI, can’t
apt install, but can
sudo shutdown -r now wouldn’t work. So, I restarted the server with the handy MaaS IPMI control. After the restart,
apt install and
nslookup now work.
mongod service starts up and starts using a decent amount of CPU and a tiny bit of IO (disk) activity.
sudo systemctl list-unit-files results in:
UNIT FILE STATE VENDOR PRESET
juju-clean-shutdown.service enabled enabled
juju-db.service enabled enabled
jujud-machine-0.service enabled enabled
sudo service jujud-machine-0 status gives:
● jujud-machine-0.service - juju agent for machine-0
Loaded: loaded (/etc/systemd/system/jujud-machine-0.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2020-05-14 10:31:55 BST; 1h 10min ago
Main PID: 779 (bash)
Tasks: 12 (limit: 9374)
sudo service juju-db status gives:
● juju-db.service - juju state database
Loaded: loaded (/etc/systemd/system/juju-db.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2020-05-14 11:45:07 BST; 1s ago
Main PID: 23566 (mongod)
Tasks: 3 (limit: 9374)
sudo service juju-clean-shutdown status gives:
● juju-clean-shutdown.service - Stop all network interfaces on shutdown
Loaded: loaded (/etc/systemd/system/juju-clean-shutdown.service; enabled; vendor preset: enabled)
Active: inactive (dead)
But there’s still no response whatsoever from juju.
So I checked the logs
journalctl -b, to find repeated
read checksum error for 4096B block at offset 65536: calculated block checksum of 1624741532 doesn't match expected checksum of 1174969535
WT_SESSION.open_cursor: the process must exit and restart: WT_PANIC: WiredTiger library panic
May 14 10:32:22 pleach.tombull.com mongod.37017: [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 366
May 14 10:32:22 pleach.tombull.com mongod.37017: [initandlisten]
***aborting after fassert() failure
It seems like the
juju-db was just repeatedly restarting
mongod and ignoring the error.
So I ran
sudo service juju-db stop and followed it with
sudo mongod --dbpath /var/lib/juju/db --repair and then
sudo service juju-db start.
Did the juju GUI magically start working? Unfortunately not yet. But after a quick restart (
sudo shutdown -r now works fine now). Everything is back up and running fine.