How do you use hacluster?

Hi,

I’m trying to enable HA on some components that usually supports it through the hacluster charm.
Everything seems to bo correct but, only 1 component out of 3 seem to work properly.
The 2 others have an error message saying that apache is not started.
SSHing to one of those unit, I see that the certificate correspondig to the vip has not been generated on the units that are not the leader unit but apache configuration is expecting it and so, since the certificate is not there, it fails.

Why so ?
Why this certificate is not generated on the non leader units ?
It happened with all the apps I tried to configure as HA such as Glance, Heat, keystone, …
They all fail for the same reason.

Here is a snippet from my bundle :

  heat:
    charm: cs:heat
    num_units: 3
    to:
    - lxd:0
    - lxd:1
    - lxd:2
    options:
      vip: "192.168.210.224 192.168.211.224"
    bindings:
      "": internal
      admin: public
      ha: public
      public: public

  heat-hacluster:
    charm: cs:hacluster
    bindings:
      "": internal

  heat-mysql-router:
    charm: cs:mysql-router
    bindings:
      "": internal

relations:
- - heat:ha
  - heat-hacluster:ha
- - heat:identity-service
  - keystone:identity-service
- - heat:amqp
  - rabbitmq-server:amqp
- - heat:shared-db
  - heat-mysql-router:shared-db
- - heat-mysql-router:db-router
  - mysql-innodb-cluster:db-router
- - heat:certificates
  - vault:certificates

Just to clarify : I have no issue with Vault and this deployment work like a charm when my components are not in HA, only when a scale them up with hacluster, it fails.

Here is the ending result :

App                Version  Status   Scale  Charm         Store       Rev  OS      Notes
heat               14.0.0   blocked      3  heat          jujucharms  277  ubuntu
heat-hacluster              active       3  hacluster     jujucharms   69  ubuntu
heat-mysql-router  8.0.21   active       3  mysql-router  jujucharms    3  ubuntu

Unit                    Workload  Agent  Machine  Public address  Ports              Message
heat/0                  blocked   idle   0/lxd/2  192.168.210.36  8000/tcp,8004/tcp  Services not running that should be: apache2
  heat-hacluster/2      active    idle            192.168.210.36                     Unit is ready and clustered
  heat-mysql-router/2   active    idle            192.168.210.36                     Unit is ready
heat/1                  blocked   idle   1/lxd/2  192.168.210.23  8000/tcp,8004/tcp  Services not running that should be: apache2
  heat-hacluster/1      active    idle            192.168.210.23                     Unit is ready and clustered
  heat-mysql-router/1   active    idle            192.168.210.23                     Unit is ready
heat/2*                 blocked   idle   2/lxd/1  192.168.210.41  8000/tcp,8004/tcp  Services not running that should be: apache2
  heat-hacluster/0*     active    idle            192.168.210.41                     Unit is ready and clustered
  heat-mysql-router/0*  active    idle            192.168.210.41                     Unit is ready

Hello @Hybrid512
What you described sounds like what’s been going on in this thread – [BUG] openstack hacluster apache2 service not running, wrong ssl cert name.

The workaround I used was to manually create the symlinks to the certs on those failing units. If you look at the status of the apache2 service, it will tell you what cert it is failing to find at service startup. From there you can go to the path and run sudo ln -s command to create the links.

After some investigation together with @Hybrid512 we found out that there is some easy-to-reproduce randomness involving a Vault with the auto-unlock feature (this seems less likely to happen without this feature). Here is a simple bundle:

series: focal

applications:
  mysql-innodb-cluster:
    charm: cs:mysql-innodb-cluster
    num_units: 3

  rabbitmq-server:
    charm: cs:rabbitmq-server
    num_units: 1

  vault:
    charm: cs:vault
    num_units: 1
    options:
      totally-unsecure-auto-unlock: true
      auto-generate-root-ca-cert: true
  vault-mysql-router:
    charm: cs:mysql-router
    num_units: 0

  heat:
    charm: cs:heat
    num_units: 3
    options:
      vip: "172.20.0.142 172.20.0.143"
  heat-hacluster:
    charm: cs:hacluster
    num_units: 0
  heat-mysql-router:
    charm: cs:mysql-router
    num_units: 0

  keystone:
    charm: cs:keystone
    num_units: 1
  keystone-mysql-router:
    charm: cs:mysql-router
    num_units: 0

relations:
- - heat:ha
  - heat-hacluster:ha
- - heat:shared-db
  - heat-mysql-router:shared-db
- - heat-mysql-router:db-router
  - mysql-innodb-cluster:db-router
- - heat:amqp
  - rabbitmq-server:amqp

- - keystone:shared-db
  - keystone-mysql-router:shared-db
- - keystone-mysql-router:db-router
  - mysql-innodb-cluster:db-router

- - vault:shared-db
  - vault-mysql-router:shared-db
- - vault-mysql-router:db-router
  - mysql-innodb-cluster:db-router

- - heat:certificates
  - vault:certificates
- - keystone:certificates
  - vault:certificates
- - heat:identity-service
  - keystone:identity-service

After the deployment has settled:

$ juju run -a heat -- ls -l /etc/apache2/ssl/heat/
- Stdout: |
    total 16
    lrwxrwxrwx 1 root root   72 Oct 13 14:33 cert_172.20.0.23 -> /etc/apache2/ssl/heat/cert_juju-0f0343-lourot-heat-0.project.serverstack
    -rw-r----- 1 root root 1548 Oct 13 14:33 cert_juju-0f0343-lourot-heat-0.project.serverstack
    lrwxrwxrwx 1 root root   71 Oct 13 14:33 key_172.20.0.23 -> /etc/apache2/ssl/heat/key_juju-0f0343-lourot-heat-0.project.serverstack
    -rw-r----- 1 root root 1674 Oct 13 14:33 key_juju-0f0343-lourot-heat-0.project.serverstack
  UnitId: heat/0
- Stdout: |
    total 16
    lrwxrwxrwx 1 root root   72 Oct 13 14:35 cert_172.20.0.142 -> /etc/apache2/ssl/heat/cert_juju-0f0343-lourot-heat-1.project.serverstack
    -rw-r----- 1 root root 1548 Oct 13 14:35 cert_juju-0f0343-lourot-heat-1.project.serverstack
    lrwxrwxrwx 1 root root   71 Oct 13 14:35 key_172.20.0.142 -> /etc/apache2/ssl/heat/key_juju-0f0343-lourot-heat-1.project.serverstack
    -rw-r----- 1 root root 1674 Oct 13 14:35 key_juju-0f0343-lourot-heat-1.project.serverstack
  UnitId: heat/1
- Stdout: |
    total 16
    lrwxrwxrwx 1 root root   72 Oct 13 14:32 cert_172.20.0.18 -> /etc/apache2/ssl/heat/cert_juju-0f0343-lourot-heat-2.project.serverstack
    -rw-r----- 1 root root 1548 Oct 13 14:32 cert_juju-0f0343-lourot-heat-2.project.serverstack
    lrwxrwxrwx 1 root root   71 Oct 13 14:32 key_172.20.0.18 -> /etc/apache2/ssl/heat/key_juju-0f0343-lourot-heat-2.project.serverstack
    -rw-r----- 1 root root 1674 Oct 13 14:32 key_juju-0f0343-lourot-heat-2.project.serverstack
  UnitId: heat/2

As you can see only heat/1 got a cert file name mentioning the VIP, although this unit isn’t even the current leader.

Together with @thedac we think this might have the same root cause as Bug #1893847 “Certificates are not created” : Bugs : OpenStack nova-cloud-controller charm

2 Likes

This is a very interesting find. I too have that option enabled for Vault in all of the bundles I’ve deployed

  vault: # Scale up with hacluster post deployment
    charm: cs:vault
    num_units: 1
    to:
    - lxd:0
    options:
      vip: *vault-vip
      auto-generate-root-ca-cert: true
    annotations:
      gui-x: "1610"
      gui-y: "1430"
    bindings:
      "": *admin
      ha: *admin
      shared-db: *shared-db
  vault-mysql-router:
    charm: cs:mysql-router
    annotations:
      gui-x: "1535"
      gui-y: "1560"
    bindings:
      "": *admin
      shared-db: *shared-db
  vault-hacluster:
    charm: 'cs:hacluster'
    series: focal
    bindings:
      "": *admin
      ha: *admin
    options:
      cluster_count: 3

In my bundle I deploy 1 vault unit and wait for mysql to settle down with all the cluster/db creations it’s doing and then scale up the Vault units by 2 to satisfy hacluster. I do this with openstack-dashboard and a few other charms as well. I took this route after noticing a patter within my CI/CD pipeline. If I deploy a bundle with all the units needed for HA, charms would end up in failed ha-relation changed hook states. I have a higher success rate if I just deployed 1 unit for each charm and then just wait for when everything is settled down to scale up the remaining hacluster units for each charm. To me it looks like a race condition issue, however I’m not quite sure.

I would say that too … I tried (and retried … and retried …) without those “hacky” options by unsealing vault manually … it works but didn’t really change the situation.
I still have some charms that are having issues with their symlinks to certificates and this happens randomly.
As a hint, my bundle is quite big and my machines are very heavily loaded during deployment (high cpu usage but also high IO usage) … to me this is probably a race condition, in any case, this is not reliable.