Having some trouble with a new (PoC, MAAS + Juju) OpenStack installation using the mysql-innodb-cluster charm.
There are three database servers with fixed IPv4 addresses. The hardware RAID on one of them failed, causing the OS to be mounted RO. This caused the charm to elect a new leader, as the failed server was leader at the time. The juju-agent was lost and never recovered. The two other servers lived on. The unit and machine were forcibly removed from Juju, after which the charm went into a blocked state. Of course, a cluster has to have 3 members.
The RAID-card was replaced, the disks were wiped, the server was re-Ready in MAAS and being naïve I thought I could do an ‘add-unit’ and the server would be added to the cluster, overriding the old server at that address.
The add-unit did deploy, but now the charm is in an error state because the new unit is not in a cluster. Running the action ‘cluster-status’ on one of the other units reveals that the address is still in use. Ok, so the next step would be removing the instance with that address from the cluster I thought.
Using the action ‘juju run-action --wait mysql/1 remove-instance --string-args address=172.30.50.10’ returns the following:
UnitId: mysql/1 id: "86" message: Remove instance failed results: output: |+ Logger: Tried to log to an uninitialized logger. Traceback (most recent call last): File "<string>", line 3, in <module> SystemError: TypeError: Cluster.remove_instance: Option 'force' is expected to be of type Bool, but is Null traceback: | Traceback (most recent call last): File "/var/lib/juju/agents/unit-mysql-1/charm/actions/remove-instance", line 299, in remove_instance output = instance.remove_instance(address, force=force) File "/var/lib/juju/agents/unit-mysql-1/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 813, in remove_instance raise e File "/var/lib/juju/agents/unit-mysql-1/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 801, in remove_instance output = self.run_mysqlsh_script(_script).decode("UTF-8") File "/var/lib/juju/agents/unit-mysql-1/charm/lib/charm/openstack/mysql_innodb_cluster.py", line 1436, in run_mysqlsh_script return subprocess.check_output(cmd, stderr=subprocess.PIPE) File "/usr/lib/python3.8/subprocess.py", line 411, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/snap/bin/mysqlsh', '--no-wizard', '--python', '-f', '/root/snap/mysql-shell/common/tmp_tidatm8.py']' returned non-zero exit status 1. status: failed
A .yaml file with the --params gives the same result for a valid yaml boolean for force (true):
Then I figured I could just do it by hand, but that requires knowing how to connect to the cluster… and where the mysql charm stores the password in
/var/lib/mysql/mysql.passwd, I haven’t found an equivalent for this charm. The temporary file
tmp_tidatm8.py is of course, temporary and I can’t check the values used in that.
I’m a bit stuck here. And while my hamfisted forced removal of the unit and machine in Juju can’t have helped I’m looking for a way to restore the cluster rather than do an entire redeploy.
Can anyone give some pointers about what to do better next time, and what to do now?