Maas + OpenStack (suspected) strange dns problem

I have been deploying OpenStack through a charm bundle for a few years and this has worked perfectly so far.
Now I started upgrading to Stein this week and am running into some problems.

First of all: Everything deploys as expected. I am running 4 compute nodes and most of the other services are on three other (clustered) nodes.
I used to run Ceph on these nodes with fiber channel 3par devices, but as these only support 4gb/s I wanted to replace them with a “new” storage backend (HP LeftHand) so users could run VM’s from volumes instead from local storage.

I repurposed the 3par devices for glance storage through the swift charm.

Everything works as expected… but for one compute node only!
When launching a VM it is only successfully scheduled to and booted at one node (blade1). If the vm is scheduled on the other blades compute errors out complaining about neutron:

2020-01-11 15:42:26.159 23426 INFO nova.virt.libvirt.driver [req-efed8095-b950-44b2-9d97-cc78fc52d44d 56e13ecd83d64f94bc2ce0b771ca279e a9b5f5abd39946c4ae4cf2a3be27d2bd - 35ede6a349084fc782ed50988d2392c6 35ede6a349084fc782ed50988d2392c6] [instance: 6338c46e-6ff6-4409-8725-44150edcaf3c] Ignoring supplied device name: /dev/vda. Libvirt can't honour user-supplied dev names
2020-01-11 15:42:26.544 23426 INFO nova.virt.block_device [req-efed8095-b950-44b2-9d97-cc78fc52d44d 56e13ecd83d64f94bc2ce0b771ca279e a9b5f5abd39946c4ae4cf2a3be27d2bd - 35ede6a349084fc782ed50988d2392c6 35ede6a349084fc782ed50988d2392c6] [instance: 6338c46e-6ff6-4409-8725-44150edcaf3c] Booting with volume-backed-image e34a382a-31d5-4e60-b610-e9335a95f56c at /dev/vda
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager [req-efed8095-b950-44b2-9d97-cc78fc52d44d 56e13ecd83d64f94bc2ce0b771ca279e a9b5f5abd39946c4ae4cf2a3be27d2bd - 35ede6a349084fc782ed50988d2392c6 35ede6a349084fc782ed50988d2392c6] Instance failed network setup after 1 attempt(s): nova.exception.PortBindingFailed: Binding failed for port 420c1a15-7d76-4f13-b943-d02cc14f3c0f, please check neutron logs for more information.
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager Traceback (most recent call last):
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 1521, in _allocate_network_async
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     resource_provider_mapping=resource_provider_mapping)
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/nova/network/neutronv2/api.py", line 1122, in allocate_for_instance
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     bind_host_id, available_macs, requested_ports_dict)
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/nova/network/neutronv2/api.py", line 1255, in _update_ports_for_instance
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     vif.destroy()
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     self.force_reraise()
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     six.reraise(self.type_, self.value, self.tb)
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     raise value
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/nova/network/neutronv2/api.py", line 1225, in _update_ports_for_instance
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     port_client, instance, port_id, port_req_body)
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/nova/network/neutronv2/api.py", line 580, in _update_port
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     _ensure_no_port_binding_failure(port)
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager   File "/usr/lib/python3/dist-packages/nova/network/neutronv2/api.py", line 250, in _ensure_no_port_binding_failure
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager     raise exception.PortBindingFailed(port_id=port['id'])
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager nova.exception.PortBindingFailed: Binding failed for port 420c1a15-7d76-4f13-b943-d02cc14f3c0f, please check neutron logs for more information.
2020-01-11 15:42:31.026 23426 ERROR nova.compute.manager 
2020-01-11 15:43:20.840 23426 INFO nova.compute.manager [req-e12eed83-a654-43f8-bc84-149ac31e1492 - - - - -] Updating bandwidth usage cache
2020-01-11 15:43:20.878 23426 INFO nova.compute.manager [req-e12eed83-a654-43f8-bc84-149ac31e1492 - - - - -] Bandwidth usage not supported by libvirt.LibvirtDriver.

The neutron logs does not show anything interesting (this is from a different capture, but the output is the same )

2020-01-09 14:18:13.745 2301 INFO neutron.common.config [-] Logging enabled!

2020-01-09 14:18:13.745 2301 INFO neutron.common.config [-] /usr/bin/neutron-openvswitch-agent version 14.0.2

2020-01-09 14:18:13.746 2301 INFO os_ken.base.app_manager [-] loading app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp

2020-01-09 14:18:14.341 2301 INFO os_ken.base.app_manager [-] loading app os_ken.app.ofctl.service

2020-01-09 14:18:14.343 2301 INFO os_ken.base.app_manager [-] loading app os_ken.controller.ofp_handler

2020-01-09 14:18:14.343 2301 INFO os_ken.base.app_manager [-] instantiating app neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_oskenapp of OVSNeutronAgentOSKenApp

2020-01-09 14:18:14.344 2301 INFO os_ken.base.app_manager [-] instantiating app os_ken.app.ofctl.service of OfctlService

2020-01-09 14:18:14.344 2301 INFO os_ken.base.app_manager [-] instantiating app os_ken.controller.ofp_handler of OFPHandler

2020-01-09 14:18:14.346 2301 INFO neutron.agent.agent_extensions_manager [-] Loaded agent extensions: []

2020-01-09 14:18:14.356 2301 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/plugins/ml2/openvswitch_agent.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpl13vn84z/privsep.sock']

2020-01-09 14:18:15.188 2301 INFO oslo.privsep.daemon [-] Spawned new privsep daemon via rootwrap

2020-01-09 14:18:15.094 2340 INFO oslo.privsep.daemon [-] privsep daemon starting

2020-01-09 14:18:15.097 2340 INFO oslo.privsep.daemon [-] privsep process running with uid/gid: 0/0

2020-01-09 14:18:15.100 2340 INFO oslo.privsep.daemon [-] privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none

2020-01-09 14:18:15.100 2340 INFO oslo.privsep.daemon [-] privsep daemon running as pid 2340

2020-01-09 14:18:15.724 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [-] Bridge br-int has datapath-ID 00003a81cf0cf048

2020-01-09 14:18:16.015 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-f8727373-1b8e-404c-9acc-e0ab37af9f8d - - - - -] Mapping physical network physnet1 to bridge br-ex

2020-01-09 14:18:16.016 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-f8727373-1b8e-404c-9acc-e0ab37af9f8d - - - - -] Bridge br-ex datapath-id = 0x0000e0db55589cf0

2020-01-09 14:18:16.033 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [req-f8727373-1b8e-404c-9acc-e0ab37af9f8d - - - - -] Bridge br-ex has datapath-ID 0000e0db55589cf0

2020-01-09 14:18:16.057 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ovs_bridge [req-f8727373-1b8e-404c-9acc-e0ab37af9f8d - - - - -] Bridge br-tun has datapath-ID 00003efba5a07f47

2020-01-09 14:18:16.618 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Agent initialized successfully, now running...

2020-01-09 14:18:32.858 2301 INFO neutron.agent.common.ovs_lib [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Port a274a624-2834-4f48-993a-b06c7099408c not present in bridge br-int

2020-01-09 14:18:32.858 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] port_unbound(): net_uuid None not managed by VLAN manager

2020-01-09 14:18:32.860 2301 INFO neutron.agent.securitygroups_rpc [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Remove device filter for ['a274a624-2834-4f48-993a-b06c7099408c']

2020-01-09 14:18:34.858 2301 INFO neutron.agent.common.ovs_lib [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Port c56ec171-3956-4d8e-8859-fa92be0ef26d not present in bridge br-int

2020-01-09 14:18:34.859 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] port_unbound(): net_uuid None not managed by VLAN manager

2020-01-09 14:18:34.859 2301 INFO neutron.agent.securitygroups_rpc [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Remove device filter for ['c56ec171-3956-4d8e-8859-fa92be0ef26d']

2020-01-09 14:19:36.067 2301 INFO neutron.agent.securitygroups_rpc [req-623790e4-ee32-4719-85b3-6dd3bc94517e 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:36.740 2301 INFO neutron.agent.securitygroups_rpc [req-6fb41fa4-3690-4cad-8b69-dcfe24f7be9f 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:37.299 2301 INFO neutron.agent.securitygroups_rpc [req-a82a24cb-a8e7-4b0c-970a-10f7bf4a2e8c 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:37.866 2301 INFO neutron.agent.securitygroups_rpc [req-9dc0cb12-cc85-41a8-b987-7eb5bd752078 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:37.869 2301 INFO neutron.agent.securitygroups_rpc [req-1582f1b6-c20f-4395-93e3-fa028809ee0c 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:39.561 2301 INFO neutron.agent.securitygroups_rpc [req-aee87ef8-386a-4980-a9ce-2b83c2df0e8a 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:39.850 2301 INFO neutron.agent.securitygroups_rpc [req-68100ca6-87fb-4423-97b9-c9cfecfd41ec 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:40.820 2301 INFO neutron.agent.securitygroups_rpc [req-a9a8a36c-b7b5-4424-96c2-0fb9b1113f53 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:40.899 2301 INFO neutron.agent.common.ovs_lib [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Port c7d2e3f3-dfda-450c-9afd-22181924abf7 not present in bridge br-int

2020-01-09 14:19:40.899 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] port_unbound(): net_uuid None not managed by VLAN manager

2020-01-09 14:19:40.900 2301 INFO neutron.agent.common.ovs_lib [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Port 73ba8b91-6836-41ed-bc37-f3c34a3fcc65 not present in bridge br-int

2020-01-09 14:19:40.900 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] port_unbound(): net_uuid None not managed by VLAN manager

2020-01-09 14:19:40.900 2301 INFO neutron.agent.common.ovs_lib [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Port 137f7add-592c-42b4-a46b-0b2f3dbe2cee not present in bridge br-int

2020-01-09 14:19:40.901 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] port_unbound(): net_uuid None not managed by VLAN manager

2020-01-09 14:19:40.901 2301 INFO neutron.agent.securitygroups_rpc [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Remove device filter for ['137f7add-592c-42b4-a46b-0b2f3dbe2cee', 'c7d2e3f3-dfda-450c-9afd-22181924abf7', '73ba8b91-6836-41ed-bc37-f3c34a3fcc65']

2020-01-09 14:19:41.740 2301 INFO neutron.agent.securitygroups_rpc [req-aadc1206-3f02-44e4-b85b-782602bc0f4c 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:19:42.900 2301 INFO neutron.agent.common.ovs_lib [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Port bda2908f-dff4-4895-ab55-b28893f4bcb1 not present in bridge br-int

2020-01-09 14:19:42.901 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] port_unbound(): net_uuid None not managed by VLAN manager

2020-01-09 14:19:42.901 2301 INFO neutron.agent.securitygroups_rpc [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Remove device filter for ['bda2908f-dff4-4895-ab55-b28893f4bcb1']

2020-01-09 14:22:14.068 2301 INFO neutron.agent.securitygroups_rpc [req-be3d80e5-21f0-47a1-8b6a-1e0edffb36d7 87c96abf614044c68dc9cb701cc03324 d1762d6fcca2470da8c041adcc9f9000 - - -] Security group member updated {'b9804401-39d2-44b3-80ca-ab881c6dc567'}

2020-01-09 14:22:15.001 2301 INFO neutron.agent.common.ovs_lib [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Port 12710502-d857-4de2-8bca-0089aaeb44aa not present in bridge br-int

2020-01-09 14:22:15.002 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] port_unbound(): net_uuid None not managed by VLAN manager

2020-01-09 14:22:15.002 2301 INFO neutron.agent.securitygroups_rpc [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Remove device filter for ['12710502-d857-4de2-8bca-0089aaeb44aa']

2020-01-09 14:22:38.734 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [-] SIGTERM received, capping RPC timeout by 10 seconds.

2020-01-09 14:22:38.742 2301 ERROR neutron.agent.common.async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: 2020-01-09T14:22:38Z|00001|fatal_signal|WARN|terminating with signal 15 (Terminated)

2020-01-09 14:22:38.742 2301 ERROR neutron.agent.common.async_process [-] Error received from [ovsdb-client monitor tcp:127.0.0.1:6640 Interface name,ofport,external_ids --format=json]: None

2020-01-09 14:22:39.015 2301 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-3498fdbd-bdde-41f2-adf2-331e24b22e11 - - - - -] Agent caught SIGTERM, quitting daemon loop.

I’v done some investigation and did not find any differences between the 4 compute nodes ( they are all blades of the same type in the same chassis ). The only difference I’ve found ( and I suspect is the problem) is that somehow the first blade is registering itself as ‘blade1’ and the others have used a different DNS-name it pulled out of MAAS:

I suspect this interrupts communication between Nova and Neutron somehow, but am running out of options what to check. I checked the systemd-resolve settings and they are identical.

If anyone has some ideas where to look further I’d be grateful!

Tim

Hey @tim.waters

Welcome! I had something similar but it was as a result of me poking things with a stick.
I wanted to change the DNS suffix of my lab after nodes were deployed. In turn a couple things lost their search strings and could find anything else.

Made any modifications to your DHCP / DNS settings on your region or rack controllers recently?

Hey @tim.waters,

Any chance you’re using mis-matched space bindings for the OpenStack bits? Could you share the bundle that you’re using to deploy?

Hi @dvnt and @chris.macnaughton,

Thanks for your input. I’ve managed to solve the problem ( partly ).
The problem with the discrepancy between blade1 and the others was unfortunately caused by my self. I had been debugging late in to the night and had a suspicion that nova-compute was the culprit and had separated one blade from the others by defining another app. I used another charm revision ( the one I used in my testing environment ) of nova-compute for that app which worked. But as the other blades were not downgraded they still failed. The day after I forgot I made this changes…

So I now have a stable OpenStack deployment on bionic/stein but am still wondering why rev 309 is failing for me. Any thoughts, or should I file a bug report?

This bug and it’s related commits to nova-compute and neutron-openvswitch charms relate to this hostname issue and potential issue with nova and neutron getting different hostnames. I believe there’s a fix proposed: Bug #1839300 “Instance failover fails at stein due to inconsiste...” : Bugs : OpenStack nova-compute charm.

You may need to make sure your nova-compute: cloud-compute binding and your neutron-openvswitch: neutron-plugin bindings match the same space until the proposed fix is released.

@afreiberger Thanks! I did check the the space bindings and they are in fact on the same space. Do not think that a work around is going to work for me.

I’m not in a hurry, so I’ll just wait for the fix to trickle down

The fix for the FQDN settings for nova-compute and neutron-openvswitch have been fix-released.

Thnx! I’ll check later this month, but I’ve no doubt this will fix it.