Using the model cache

Initial ideas

The main driver behind creating the model cache was to reduce read load on mongo. Given that the size of the models really isn’t that big, the idea of having the model cached in memory and the information being accessible from that seemed like an appealing idea.

Secondly there is a desire to create a business logic tier in between the apiserver and the state layers. The idea being that we move business logic down from the apiserver so the apiserver is all about just exposing this information over the API, and up out of the state package, so the state package becomes purely a persistence mechanism. The idea here being that we could provide a fake persistence layer to test the business logic and have the tests run exceedingly fast.

However getting there from here is a bit of a problem.

First trial (success) - model config

One of the problems with the coarse grained model-config watchers that the agents use is that any change to any part of the model config causes the workers to wake up. The workers then ask for the model-config only to find that the bit they cared about wasn’t changed, so they go back to sleep. However many workers depend on model-config, and in a deployment where there are, say, 1000 units, changing a configuration value will cause in the range of perhaps 10k wake ups and requests for model-config. Every one of those read requests hits the database.

What we did was to create a model config watcher where the caller could specify the keys that they were interested in. Workers that used this watcher could safely also get the configuration from the cache.

Second trial (issues) - charm config

The aim here was to provide charm configuration from the cache, where this had the logic of dealing with the branch configuration (nee generations).

The problem here was related to the unit agent calling set charm URL, and then requesting the charm configuration. The problem is that the charm configuration is dependent on the charm version, and if the cache isn’t up to date with the database change, then you may get config for an old or missing charm.

Next trial target - status

juju status is one of the areas where the user expects slight latency. The operator knows that the agents are busy bringing the world into a state that matches the planned model.

If we have everything we need for status available in the cache, we should be able to respond to status calls even on the biggest models in less than 50ms. If we can make it seem immediate for the operator, this is a big deal.

Rules for cache usage

  1. Watchers can be added to the cache and used by agents or clients.
  2. Use the database as a source of data in the general case

There will be exceptions to rule 2, but they are exceptions at this stage. At least if we use watchers from the cache and data from the database we will always be good enough.

1 Like

We have a recently added mechanism to ensure that the cache is primed with the working model before we attempt to use it, but there is no general logic to make the same guarantee for other entities.

As it stands, this means that watchers would need to be independent of any other cache data for instantiation.

The charm-configuration case was a series of land-mines related to the complexity of the uniter:

  • Install is an exception to the normal hook execution workflow.
  • The uniter sets the unit charm upon installation, which it immediately relies upon for retrieving and watching charm config.
  • Subordinates become nascent when relations are entered into, making it possible for an agent that is relatively fast compared to the controller to hit the cache for data that is not yet present.

One option here would have been to “double down” on cache usage so that we didn’t have the hybrid case of events being triggered directly from state that subsequently required cache usage, but this was not acceptable in terms of risk or time constraints.