Updated podspec YAML - new features

Background

Juju k8s charms communicate to Juju the artifacts needed to provision their workloads.

The guiding principle is that we model everything that is generic.
The initial model was deliberately quite small in the entities that were defined.

As the kubeflow k8s charms were developed, additional k8s functionality was necessary to be delivered in the pod YAML that Juju passed through to “deploy” the charms.

V1 Features

As a reminder, here’s what was done for v1.

Juju defines a substrate agnostic model which the charms use to specify what they want.
Key concepts include:

  • containers
  • image path, access secrets
  • ports
  • resource limits via constraints (mem, cpu power supported)
  • affinity via constraint tags
  • config files created on the workload filesystem
  • workload config via environment variables
  • storage (via the standard Juju storage modelling)
  • security (run as root, allow privilege escalation etc)
  • k8s specific custom resources

Charms specify what they need in a YAML file and send to the controller using a pod-spec-set hook command.

A curated subset of K8s specific sections were included in the primary YAML file, eg liveliness probe.

V2 Features

Firstly, we introduce a version attribute to allow us to maintain compatibility with v1 and also allow subsequent improvements (eg additions to what we model).

version: 2

k8s specific artifacts will be specified using a separate YAML file to keep a clean separation of what’s modelled and what’s k8s. What is added to the k8s specific YAML is an opinionated and curated subset of what’s possible using kubectl and native k8s YAML directly.

We add support for missing features:

  • config maps
  • service accounts
  • workload permissions and capabilities
  • secrets
  • custom resources

We split the YAML into 2 files - one for core modelling concepts that map well to the Juju model, and the other for k8s specific things like CustomResourceDefinitions, Custom Resources, and Secrets.

$ podspec-set spec.yaml --k8s-resources resources.yaml

Most charms will not need any k8s specific resources so that yaml file is passed as an optional parameter.

Charm metadata.yaml

We added a minimum k8s version attribute, similar to minVersion for Juju. This will live with the other k8s deployment attributes.

deployment:
    min-version: x.y
    type: stateless | stateful
    service: loadbalancer | cluster | omit

Note: service omit is now used instead of omitServiceFrontend in the podspec YAML.

Changes to the podspec YAML

The following sections describe v2 specific changes to the podspec YAML file passed as the first argument to pod-spec-set.

Workload permissions and capabilities

We allow a set of rules to be associated with the application to confer capabilities to the workload; a set of rules constitutes a role. If a role is required for an application, Juju will create a service account for the application with the same name as the application. Juju takes care of the internal k8s details like creating a role binding etc automatically.

Some applications may require that cluster scoped roles are used. Used global: true if cluster scoped rules are required.

serviceAccounts:
  automountServiceAccountToken: true
  # roles are usually scoped to the model namespace, but
  # some workloads like istio require binding to cluster wide roles
  # use global = true for cluster scoped roles
  global: true
  #
  # these rules are based directly on role rules supported by k8s
  rules:
    - apiGroups: [""] # "" indicates the core API group
      resources: ["pods"]
      verbs: ["get", "watch", "list"]
    - nonResourceURLs: ["*"]
      verbs: ["*"]

Config Maps

These are essentially named databags.

configMaps:
  mydata:
    foo: bar
    hello: world

Scale Policy

As well as setting annotations, it’s now possible to set the scale policy for services, ie how should the workload pods be started, serially one at a time, or in parallel. The default is parallel.

service:
  scalePolicy: serial
  annotations:
    foo: bar

k8s specific container attributes

k8s specific container attributes like liveliness probes and security context info are now in their own section under each container definition.

containers:
  - name: gitlab
    image: gitlab/latest
    kubernetes:
      securityContext:
        runAsNonRoot: true
        privileged: true
      livenessProbe:
        initialDelaySeconds: 10
        httpGet:
          path: /ping
          port: 8080
      readinessProbe:
        initialDelaySeconds: 10
        httpGet:
          path: /pingReady
          port: www

K8s Specific YAML

This YAML includes things like custom resources and their associated custom resource definitions, as well as secrets etc. All of the following are passed to Juju by placing any required sections in the file passed via the --k8s-resources argument to pod-spec-set.

The YAML syntax is curated from the native k8s YAML to remove the boilerplate and other unnecessary cruft, leaving the business attributes. Here’s an example of defining a custom resource definition and a custom resource. These could well be done by different charms, but are shown together here for brevity.

kubernetesResources:
  customResourceDefinitions:
    tfjobs.kubeflow.org:
      group: kubeflow.org
      scope: Namespaced
      names:
        kind: TFJob
        singular: tfjob
        plural: tfjobs
      versions:
        - name: v1
          served: true
          storage: true
      subresources:
        status: {}
      validation:
        openAPIV3Schema:
          properties:
            spec:
              properties:
                tfReplicaSpecs:
                  properties:
                    # The validation works when the configuration contains
                    # `Worker`, `PS` or `Chief`. Otherwise it will not be validated.
                    Worker:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                    PS:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                    Chief:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                          maximum: 1
    tfjob1s.kubeflow.org1:
      group: kubeflow.org1
      scope: Namespaced
      names:
        kind: TFJob1
        singular: tfjob1
        plural: tfjob1s
      versions:
        - name: v1
          served: true
          storage: true
      subresources:
        status: {}
      validation:
        openAPIV3Schema:
          properties:
            spec:
              properties:
                tfReplicaSpecs:
                  properties:
                    # The validation works when the configuration contains
                    # `Worker`, `PS` or `Chief`. Otherwise it will not be validated.
                    Worker:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                    PS:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                    Chief:
                      properties:
                        replicas:
                          type: integer
                          minimum: 1
                          maximum: 1
  customResources:
    tfjobs.kubeflow.org:
      - apiVersion: "kubeflow.org/v1"
        kind: "TFJob"
        metadata:
          name: "dist-mnist-for-e2e-test"
        spec:
          tfReplicaSpecs:
            PS:
              replicas: 2
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
            Worker:
              replicas: 8
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
    tfjob1s.kubeflow.org1:
      - apiVersion: "kubeflow.org1/v1"
        kind: "TFJob1"
        metadata:
          name: "dist-mnist-for-e2e-test11"
        spec:
          tfReplicaSpecs:
            PS:
              replicas: 2
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
            Worker:
              replicas: 8
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
      - apiVersion: "kubeflow.org1/v1"
        kind: "TFJob1"
        metadata:
          name: "dist-mnist-for-e2e-test12"
        spec:
          tfReplicaSpecs:
            PS:
              replicas: 2
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0
            Worker:
              replicas: 8
              restartPolicy: Never
              template:
                spec:
                  containers:
                    - name: tensorflow
                      image: kubeflow/tf-dist-mnist-test:1.0

Secrets

Secrets will ultimately be modelled by Juju. We’re not there yet so we add the secrets definitions to the k8s specific YAML file (initially). The syntax and supported attributes are tied directly the the k8s spec. Both string and base64 encoded data are supported.

  secrets:
    - name: build-robot-secret
      type: Opaque
      stringData:
          config.yaml: |-
              apiUrl: "https://my.api.com/api/v1"
              username: fred
              password: shhhh
    - name: another-build-robot-secret
      type: Opaque
      data:
          username: YWRtaW4=
          password: MWYyZDFlMmU2N2Rm

Pod Attributes

k8s specific pod attributes are defined in their own section.

  pod:
    restartPolicy: OnFailure
    activeDeadlineSeconds: 10
    terminationGracePeriodSeconds: 20
    securityContext:
      runAsNonRoot: true
      supplementalGroups: [1,2]
    readinessGates:
      - conditionType: PodScheduled
    dnsPolicy: ClusterFirstWithHostNet
1 Like

By the way, is there are a published spec for podspec v1?

The new K8s Spec V3 is documented here - K8s Spec v3 changes

I was able to create a ConfigMap using the v2 spec but couldn’t find anything about mounting said ConfigMap as a volume in one or more containers. Is there any documentation on this? I tried to read through the code with the limited Go knowledge that I have but from what I see, there seems to be no way to reference a ConfigMap. Am I missing something?

The new V3 spec let’s you create volumes backed by config maps

You can also used the “files” section to specify text files which will be created and placed at specified locations in the workload (using a config map but that’s transparent to the charm).

See Writing a Kubernetes charm

files:
 - name: configurations
   mountPath: /etc/mysql/conf.d
   files:
     custom_mysql.cnf: |
       [mysqld]
       skip-host-cache
       skip-name-resolve         
       query_cache_limit = 1M
       query_cache_size = %(query-cache-size)s

What is the minimum Kubernetes version supported by Juju?

At the time of writing, k8s 1.15 and 1.16 are approaching EOL. Juju works with 1.14 and most likely a version or two earlier, but these old versions are now out of upstream support.

How can we set limits in the pod_spec ? i.e for a manifest like this

    spec:
      containers:
      - args:
        - --port=7472
        - --config=config
        image: metallb/controller:v0.9.3
        imagePullPolicy: Always
        name: controller
        ports:
        - containerPort: 7472
          name: monitoring
        resources:
          limits:
            cpu: 100m
            memory: 100Mi

how do I set this in the pod_spec language? I tried this and it failed :

     'containers': [{
                    'name': 'controller',
                    'image': 'metallb/controller:v0.9.3',
                    'imagePullPolicy': 'Always',
                    'ports': [{
                        'containerPort': 7472,
                        'protocol': 'TCP',
                        'name': 'monitoring'
                    }],
                    'resources': {
                        'limits': {
                            'cpu': '100m',
                            'memory': '100Mi',
                        }
                    },

Resource limits are modelled as constraints.

Just as with vm (non k8s) charms, constraints are applied at deploy time using --constraints, eg

juju deploy foo --constraints "mem=100M cpu-power=100"

There has been discussion at various times about allowing charms to specify minimum resource requirements, but from memory that’s not been progressed to the point where it’s become something that Juju models.

Hi Ian, thank you. I saw this, and it is constraints for the VMs/Machines that juju deploys, hosting the kubernetes master/workers, etc., right? I am interested in pod constraints, inside of kubernetes.

When deploying k8s charms, --constraints apply to the pods.

This is distinct from using Juju to deploy Charmed Kubernetes, in which case --constraints apply to the vms/machines used to host k8s master/workers.

Is it possible to make a more complex config map ? I want to reproduce something like this in a charm :

apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - 192.168.1.240-192.168.1.250

but I get errors like “b’ERROR json: cannot unmarshal object into Go struct field podSpecV3.configmaps of type string\n’”

I managed to achieve this with :

'configMaps': {
    'config': {
        'config' : 'address-pools:\n- name: default\n  protocol: layer2\n  addresses:\n  - 192.168.1.240-192.168.1.250'
    }
}

Can you provide more details about what this guarantees? Is a second pod spun up immediately after the first pod is spun up and containers started, or is a second pod spun up after the first pod passes a readyness check? I have setup that cannot be run in parallel and I need to ensure other pods don’t start their entrypoint until the first pod has performed the setup, and I can’t find any reliable locking mechanism.

Here’s the k8s reference

https://kubernetes.io/docs/tutorials/stateful-application/basic-stateful-set/#pod-management-policy

With serial policy, k8s waits for each pod to become “Running”. So that means all containers in the pod have to be deemed ready; they can provide a readiness endpoint to assist k8s in this if needed.

So I think this will do what you want.

1 Like

Is there documentation regarding what is accepted in the curated YAML ? I am facing issues trying to feed a CRD yaml file to the pod_spec.

There’s some doc here which has a couple of examples:

https://juju.is/docs/charm-writing/kubernetes

There’s also some kubeflow charms which can be used to get some insight into a real world example, eg
https://github.com/juju-solutions/bundle-kubeflow/tree/master/charms