My charm cannot handle a db relation when it is deployed to multiple units

I wrote a charm for an app, and up until now I’ve only deployed the app to test environments using only one unit. I tried deploying it with two units and it can’t handle the db connection properly. What happens is that one of the two units will have the expected state for a bit, then they both eventually settle into a state where they report the db not being connected.

Here’s the code that checks for connections and sets a connected state. The salient states involved are:

  • myapp.db.connected
  • db.master.available

At first after joining the db relation one of the units reports a status of “ok”. The status that eventually is reported by both is “waiting for myapp.db.connected”

@hook("db-relation-joined", "db-relation-changed")
def db_relation_joined(pgsql):
    log("db relation joined or changed, setting name and roles", level="debug")
    cfg = config()
    pgsql.set_database(cfg["database_name"])
    pgsql.set_roles(cfg["database_roles"])
    # clear flags in case connection settings have changed
    clear_flag("myapp.db.connected")
    clear_flag("myapp.admin.ready")


@hook("db-relation-departed", "db-relation-broken")
def db_relation_departed_or_broken(pgsql):
    store = unitdata.kv()
    store.set("incoming_database_url", "")
    log("db relation departed or broken. stopping myappgunicorn", level="debug")
    services.stop("myappgunicorn")
    clear_flag("myapp.db.connected")
    clear_flag("myapp.admin.ready")


@when("myapp.installed", "db.master.available")
@when_not("myapp.db.connected")
def connect_db_services(pgsql):
    cfg = config()

    # NOTE: Sometimes the db available is not the database requested from when
    # set_database was called on the pgql interface when the relation was
    # joined. This workaround is needed for now.
    # https://lists.ubuntu.com/archives/juju/2017-February/008649.html
    master = pgsql and pgsql.master
    if master is None or master.dbname != cfg["database_name"]:
        log("db %s not yet available" % cfg["database_name"], level="warning")
        return

    log("db %s available, configuring..." % cfg["database_name"], level="debug")
    # update environment with new database_url
    storage = unitdata.kv()
    storage.set("incoming_database_url", pgsql.master.uri)

    if cfg["admin_mode"]:
        try:
            db.grant_all_privileges(pgsql.master, cfg["database_roles"])
            log("admin mode active", level="debug")
            set_flag("myapp.admin.ready")
            configure_services()
            set_flag("myapp.db.connected")
            update_status()
        except Exception as e:
            log("unable to grant database role privileges", level="error")
            log(traceback.format_exc(), level="error")
            clear_flag("myapp.admin.ready")
            status_set("blocked", "unable to grant database role privileges")
    else:
        configure_services()
        set_flag("myapp.db.connected")
        update_status()


@hook("update-status")
def update_status():
    """inform operators when myapp is installed and connected"""
    active_flags = get_flags()
    cfg = config()
    blockers = []

    # ...

    # Check for relations
    if "myapp.db.connected" not in active_flags:
        log("waiting for db relation to settle", level="debug")
        blockers.append("myapp.db.connected")

    # ...

    if blockers:
        status_set("blocked", "waiting for {}".format(" ".join(blockers)))
    else:
        status_set("active", "ok")
1 Like

I do not know if this topic is relevant to anyone else, but I’m following up just in case.

I noticed in the logs that when the relation is joined, the change hook fires off over and over. In my handler, the flags are unset in case someone changed the db settings.

In theory, someone might want to, but in fact, no one ever has and this charm isn’t used by anyone but me. I decided to shotgun debug this by removing those config options and no longer clear the flag.

Now the units are behaving like I expect.

On a tangent, I looked for other charms that use the db interface and the mailman3-web-charm is very readable. Even though they don’t use hooks for the most part, it was still instructive to read.

Am glad that things are working for you, even if the solution was to blow things up.

This is a good tip, thanks. I’ve also created a mysql interface reference document that should also help.

1 Like

I notice you also have a pgsql interface reference in progress. The conversation table is helpful.

Do you think flag descriptions would be?

It turns out I did not solve my problem, and I do not know why it didn’t show up when I ran multiple units locally but not when I ran it in my remote environment. I am scraping through my logs and I think it might be that my unit lost db.master.available when I added the other unit.

I notice in the mailman charm I was looking at for comparison that they do not have a handler for when that flag is cleared. I am rethinking my handler.

Yes, absolutely. For people maintaining reactive charm layers/interfaces, describing what their flags are is hugely important.