Openstack: ovn-chassis not creating bridge

I’ve just deployed a 6 node compute / 3 node controller environment on Ubuntu Focal / OS Ussuri. For this I’ve based myself on the openstack-base bundle. I have deployed ovn with the following juju config:

ovn-chassis:
  ovn-bridge-mappings: physnet1:br-ex
  bridge-interface-mappings: br-ex:bond0

neutron-api:
  neutron-security-groups: true
  default-tenant-network-type: vlan
  dhcp-agents-per-network: 3
  flat-network-providers: physnet1
  vip: 10.11.10.15 10.11.20.15 10.11.30.15
  vlan-ranges: physnet1:1000:2999
  worker-multiplier: 0.25
  openstack-origin: distro-proposed

All the compute hosts got the ovn-chassis subordinate and for some reason the br-ex is not created, I can also not attach any ports, which will result in the following error:

2020-06-09 18:34:53.604 541449 INFO neutron.plugins.ml2.drivers.ovn.mech_driver.mech_driver [req-943c5dd1-8aeb-42c4-a5fc-05cbe87a0e1e 10aab15997ca48a285af17604279f4ce 6a07c272727b42519431d3c976cef90d - 1d8d0c0d3f4941d8bc3803078519369b 1d8d0c0d3f4941d8bc3803078519369b] Refusing to bind port 5e356fc6-bfb1-4698-93bb-c0e4b091dddc on host supermicro5.maas due to the OVN chassis bridge mapping physical networks [] not supporting physical network: physnet1

bond0 exists on all compute nodes. Could someone point me in the right direction to debug this? Any required information available on request!

Thanks

pinging the @openstack-charmers

I think that there is an issue with OVN and vlan tunnels. The default tunnel type with OVN is geneve and may Just Work[tm] if you use that configuration.

What does the output of ip link look like on one of the compute nodes?

What happens if you use the mac addresses of the underlying interfaces in the bridge-interface-mappings?

That message is most likely a side effect of the ovn-bridge-mappings not being configured on the chassis as a result of the charm not finding the interface you have pointed it at as eligible for some reason. Let’s figure out why.

I suspect OP is attempting to use actual VLANs for tenant networks and not tunneling, that should be supported.

Thank you all for your swift responses!

So we are indeed trying to get our VLAN setup to work with the juju openstack cluster we are trying to setup, we can migrate away towards a tunnel based setup but would prefer the VLAN setup if that is possible.

Output of ip link of our nova-compute/0 node:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp67s0f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
3: enp67s0f1: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
4: enp67s0f2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
5: enp67s0f3: <NO-CARRIER,BROADCAST,MULTICAST,SLAVE,UP> mtu 1500 qdisc mq master bond0 state DOWN mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
7: bond0.1030@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
8: bond0.1040@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
9: bond0.1010@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
10: bond0.1020@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether ac:1f:6b:34:f2:b0 brd ff:ff:ff:ff:ff:ff
13: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 76:4b:82:80:74:67 brd ff:ff:ff:ff:ff:ff
14: genev_sys_6081: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 9a:dc:66:11:0b:56 brd ff:ff:ff:ff:ff:ff
15: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 42:00:66:20:80:48 brd ff:ff:ff:ff:ff:ff

For context, we have servers with four uplinks to the core switches which are bonding over LACP to those switches (MLAG setup). So we have bond0 exposed as our main interface on the nodes, these bond0’s have then VLAN’s setup to the different segregated networks (management native vlan, internal 1010, public 1020, admin 1030). Tenant VLANS would be anything > 1030.

Output of our bond:

Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP rate: fast
Min links: 0
Aggregator selection policy (ad_select): stable

Slave Interface: enp67s0f3
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:34:f2:b3
Slave queue ID: 0
Aggregator ID: 1
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1

Slave Interface: enp67s0f2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:34:f2:b2
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

Slave Interface: enp67s0f1
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:34:f2:b1
Slave queue ID: 0
Aggregator ID: 3
Actor Churn State: churned
Partner Churn State: churned
Actor Churned Count: 1
Partner Churned Count: 1

Slave Interface: enp67s0f0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: ac:1f:6b:34:f2:b0
Slave queue ID: 0
Aggregator ID: 2
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0

So I tried setting the mac instead of the bond0 reference with:

juju config ovn-chassis bridge-interface-mappings=br-ex:ac:1f:6b:34:f2:b0

Afterwards I indeed get the br-ex switch, but now I’m wondering if that is actually mapped against the bond, or the slave interface, or any of the vlan’s, since they share mac addresses? Figuring that out!

So was able to get a listing of that, and seems that worked, will try out if the networking part works later on.

ubuntu@supermicro1:~$ sudo ovs-vsctl show
4bc48696-af4e-4f9e-b9eb-86fc1e2fdcf5
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-int
        fail_mode: secure
        datapath_type: system
        Port br-int
            Interface br-int
                type: internal
        Port ovn-superm-1
            Interface ovn-superm-1
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.11.10.108"}
        Port ovn-superm-2
            Interface ovn-superm-2
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.11.10.105"}
        Port ovn-superm-4
            Interface ovn-superm-4
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.11.10.106"}
        Port ovn-superm-0
            Interface ovn-superm-0
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.11.10.107"}
        Port ovn-superm-3
            Interface ovn-superm-3
                type: geneve
                options: {csum="true", key=flow, remote_ip="10.11.10.104"}
    Bridge br-ex
        datapath_type: system
        Port br-ex
            Interface br-ex
                type: internal
        Port bond0
            Interface bond0
                type: system
    ovs_version: "2.13.0"

I can confirm that the network connectivity inside the VM’s works. I still have issues with IPv6 assignment and “DHCP Ports” not showing up but that is unrelated to the original problem. So thank you @fnordahl for the suggestion.

As for the workaround, is this the recommended way of assigning an interface to the bridge? Or should (in theory) the bond0 have worked?

1 Like

Thank you for reporting back!

The bridge-interface-mappings configuration option is meant to carry forward most of the use cases accepted on the neutron-openvswitch and neutron-gateway charms data-port configuration option. As such it should accept both interface names and mac addresses.

I know for certain it works with regular interface names and you have shown it works with mac addresses, and that it from individual interface mac addresses finds the bond they belong to and then adds the actual bond to the bridge.

So it appears its refusal of using the actual bond name directly may be a bug, if you have a spare moment we would love it if you could provide details in a bug report on the ovn-chassis Launchpad and we will investigate closer.

I went ahead and registered a bug to track this, and we will have a fix for it shortly.

1 Like

A fix to bug 1883244 has now been released and is available in cs:ovn-chassis-2 and cs:ovn-dedicated-chassis-2

1 Like

How would this work in context of multiple nodes?
In a case where you have 3 machines would you do something along the lines of:

juju config ovn-chassis bridge-interface-mappings="br-ex:00:30:48:cf:96:f1 br-ex:00:30:48:cf:af:d3 br-ex:00:30:48:cf:97:bb" ??

If yes, I have some concerns around scalability. Would I need to find the macs of each node before deploying them. Seems like a ton of work for a large environment of cattle.

Okay so I verified that this does work. I can access VM’s via floating IP addresses hanging off bond0 using this mapping, but now the juju status output now tells me that the nova and ceph-osd units are unknown/lost because their IPs are tied to the native VLAN of the bond(pxe network).
-Good news is that juju does figure out how to get to the unit if I run juju ssh nova-compute/0 etc.

@openstack-charmers My question is: what’s the correct way of deploying nova and ceph-osd units so that their IP’s are “bound” to my admin-network instead of the PXE network?
I used this config in my bundle but they still seem to automagically appear on pxe-network.

  ceph-osd:
    annotations:
      gui-x: '1065'
      gui-y: '1540'
    charm: cs:ceph-osd-303
    num_units: 3
    options:
      osd-devices: *osd-devices
      source: *openstack-origin
    bindings:
      "": admin-network
    to:
    - '0'
    - '1'
    - '2'

LOL OMG. Forgive my stupid question.

In MAAS I’ve been attaching the bond itself to a subnet associated to my pxe space with the assumption that it’s required for the PXE boot to work…

Answer: Just don’t.

:rofl:

PEBCAK

Hello All,

I think I am hitting the error describe in this thread, however I am not sure. The bridges i defined for ovn-chassis never get created. When I tail the ovn-chassis log I see the following error.

unit-ovn-chassis-0: 20:36:15 INFO unit.ovn-chassis/0.juju-log ovsdb:74: Invoking reactive handler: reactive/ovn_chassis_charm_handlers.py:83:configure_ovs
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-list"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-ids"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "relation-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "juju-log"
unit-ovn-chassis-0: 20:36:15 INFO unit.ovn-chassis/0.juju-log ovsdb:74: CompletedProcess(args=('ovs-vsctl', '--no-wait', 'set-ssl', '/etc/ovn/key_host', '/etc/ovn/cert_host', '/etc/ovn/ovn-chassis.crt'), returncode=0, stdout='')
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "network-get"
unit-ovn-chassis-0: 20:36:15 DEBUG jujuc running hook tool "juju-log"
unit-ovn-chassis-0: 20:36:15 ERROR unit.ovn-chassis/0.juju-log ovsdb:74: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-ovn-chassis-0/charm/reactive/ovn_chassis_charm_handlers.py", line 93, in configure_ovs
    charm_instance.configure_ovs(','.join(ovsdb.db_sb_connection_strs))
  File "lib/charms/ovn_charm.py", line 467, in configure_ovs
    .format(self.get_data_ip()), '--',
  File "lib/charms/ovn_charm.py", line 366, in get_data_ip
    ch_core.hookenv.network_get(
  File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1390, in network_get
    response = subprocess.check_output(
  File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['network-get', 'data', '--format', 'yaml']' returned non-zero exit status 1.

unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed Traceback (most recent call last):
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/var/lib/juju/agents/unit-ovn-chassis-0/charm/hooks/ovsdb-relation-changed", line 22, in <module>
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     main()
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     bus.dispatch(restricted=restricted_mode)
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     _invoke(other_handlers)
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     handler.invoke()
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     self._action(*args)
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/var/lib/juju/agents/unit-ovn-chassis-0/charm/reactive/ovn_chassis_charm_handlers.py", line 93, in configure_ovs
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     charm_instance.configure_ovs(','.join(ovsdb.db_sb_connection_strs))
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "lib/charms/ovn_charm.py", line 467, in configure_ovs
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     .format(self.get_data_ip()), '--',
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "lib/charms/ovn_charm.py", line 366, in get_data_ip
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     ch_core.hookenv.network_get(
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/var/lib/juju/agents/unit-ovn-chassis-0/.venv/lib/python3.8/site-packages/charmhelpers/core/hookenv.py", line 1390, in network_get
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     response = subprocess.check_output(
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed   File "/usr/lib/python3.8/subprocess.py", line 512, in run
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed     raise CalledProcessError(retcode, process.args,
unit-ovn-chassis-0: 20:36:15 DEBUG unit.ovn-chassis/0.ovsdb-relation-changed subprocess.CalledProcessError: Command '['network-get', 'data', '--format', 'yaml']' returned non-zero exit status 1.
unit-ovn-chassis-0: 20:36:16 ERROR juju.worker.uniter.operation hook "ovsdb-relation-changed" (via explicit, bespoke hook script) failed: exit status 1
unit-ovn-chassis-0: 20:36:16 DEBUG juju.machinelock machine lock released for uniter (run relation-changed (74; unit: ovn-central/2) hook)
unit-ovn-chassis-0: 20:36:16 DEBUG juju.worker.uniter.operation lock released
unit-ovn-chassis-0: 20:36:16 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-ovn-chassis-0: 20:36:16 DEBUG juju.worker.uniter [AGENT-STATUS] error: hook failed: "ovsdb-relation-changed"

the relevant charm configs in my bundle are as such:

variables:
  # Network Spaces
  # This is Management network, unrelated to OpenStack and other applications
  # OAM - Operations, Administration and Maintenance
  maas:                 &maas                 maas-mgmt
  # This is OpenStack Admin network; for adminURL endpoints
  admin:                &admin                os-mgmt
  # This is OpenStack Public network; for publicURL endpoints
  public:               &public               os-ext-api
  # This is OpenStack Internal network; for internalURL endpoints
  internal:             &internal             os-int-api
  # This is Shared Database network; for mysql-routers
  shared-db:            &shared-db            os-mgmt
  # This is the overlay network(s)  
  compute:              &compute              os-compute
  compute-ext:          &compute-ext          os-cloudpublic

applications:
    ovn-central:
        charm: 'cs:ovn-central'
        num_units: 3
        series: focal
        to:
          - 'lxd:0'
          - 'lxd:1'
          - 'lxd:2'
        bindings:
          "": *admin
          ovsdb: *shared-db
          ovsdb-cms: *shared-db
          ovsdb-peer: *shared-db
      ovn-chassis:
        charm: 'cs:ovn-chassis'
        options:
          bridge-interface-mappings: 'br-cloudpublic:eno2 br-compute:eno2.4001'
          ovn-bridge-mappings: 'os-cloudpublic:br-cloudpublic os-compute:br-compute'
        series: focal
        bindings:
          "": *admin
          ovsdb: *shared-db
          data: *compute

In MAAS, I configured each compute node’s interface as such:

I’m at a loss, any ideas?