Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create VMs with proper FQDN instead of shortname only #2151

Closed
wants to merge 1 commit into from

Conversation

cjeanner
Copy link
Contributor

@cjeanner cjeanner commented Jul 24, 2024

There is a need to get unique hostnames for the VM that will be mounting
some shared volumes in some infrastructure.

Allowing to get FQDN instead of shortnames allows to change the domain
of the VMs, making FQDN actually unique for each run.

The way we will leverage that feature is still unknown, but the support
will be in place whenever we need it.

A possibility would be to set the cifmw_devscripts_config.base_domain to
a unique value, or directly set the cifmw_reproducer_domain to whatever
value the user wants.

Testing (not in commit message)

  • delta-v6 (downstream TP)
  • va-hci

Copy link
Contributor

openshift-ci bot commented Jul 24, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Jul 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from cjeanner. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@fultonj fultonj self-requested a review July 24, 2024 14:13
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5aaff74c058842819528087fc9206c82

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 03s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 11m 54s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 19m 54s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 10m 15s
✔️ cifmw-pod-pre-commit SUCCESS in 9m 17s
✔️ cifmw-baremetal-nested-crc SUCCESS in 1h 07m 20s
✔️ cifmw-molecule-libvirt_manager SUCCESS in 24m 27s
cifmw-molecule-reproducer FAILURE in 17m 27s

@cjeanner cjeanner force-pushed the libvirt_manager/use-fqdn branch from f9fce14 to 27771cf Compare July 26, 2024 08:16
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/566df85bfdcf4b08a0c852d6ac5b7008

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 39m 04s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 13m 44s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 29m 18s
podified-multinode-hci-deployment-crc FAILURE in 55m 11s
✔️ adoption-standalone-to-crc-ceph-provider SUCCESS in 2h 24m 26s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 32s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 50s
✔️ cifmw-molecule-cifmw_cephadm SUCCESS in 4m 27s
✔️ cifmw-molecule-libvirt_manager SUCCESS in 23m 42s
cifmw-molecule-reproducer FAILURE in 19m 09s

@cjeanner cjeanner force-pushed the libvirt_manager/use-fqdn branch from 27771cf to eded598 Compare July 26, 2024 11:00
@cjeanner cjeanner changed the title Switch to FQDN Create VMs with proper FQDN instead of shortname only Jul 26, 2024
@cjeanner cjeanner force-pushed the libvirt_manager/use-fqdn branch from eded598 to c70c24f Compare July 26, 2024 11:30
@cjeanner cjeanner force-pushed the libvirt_manager/use-fqdn branch from c70c24f to 6dfb897 Compare July 26, 2024 11:36
]
) if (_ipv4_network_data | length > 0) else []
}}
rescue:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible.builtin.shell:
executable: /bin/bash
cmd: |-
test -d /home/zuul && exit 0;
set -xe -o pipefail;
echo "{{ vm }}" | sudo tee /etc/hostname;
sudo hostname -F /etc/hostname;
sudo hostnamectl hostname "{{ _fqdn }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let hostnamectl set it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the canonical hostname of the edpm node if its hosting a nova is required to be the ctrlplane fqdn

so cifmw_reproducer_domain must always equal the ctlplane network domain in the network atachmetn definition.

without this it will break live migration and tls and it will cause issues with adoption potentially too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set hostname with fqdn, and use the same in nodeset spec node hostName, it would be used as canonical_hostname and resolve to the ctlplane network ip. I guess you're not updating the nodeset spec with the correct hostName.

@fultonj
Copy link
Contributor

fultonj commented Jul 27, 2024

This looks good to me. Let's get it green.

@fultonj fultonj requested review from fmount and katarimanojk July 27, 2024 13:32
@katarimanojk
Copy link
Contributor

/LGTM

@fultonj
Copy link
Contributor

fultonj commented Aug 1, 2024

Let's make it passes all downstream tests before applying the Looks Good to Me

@cjeanner cjeanner force-pushed the libvirt_manager/use-fqdn branch 5 times, most recently from 3069ac6 to a4ee079 Compare August 7, 2024 06:20
@cjeanner
Copy link
Contributor Author

cjeanner commented Aug 7, 2024

@fultonj Status update on the testing side:

  • I could get va-hci deployed, ceph is passing and seems to work as it should
  • I could get a downstream TP with delta-v6, hitting up to tempest
    I guess we're in a good position for that one as well?

The expected ceph spec is now showing FQDN such as:

[root@compute-0 specs]# grep -B1 -A3 hosts ceph_spec.yaml 
placement:
  hosts:
  - compute-0.ocp.openstack.lab
  - compute-1.ocp.openstack.lab
  - compute-2.ocp.openstack.lab
--
placement:
  hosts:
  - compute-0.ocp.openstack.lab
  - compute-1.ocp.openstack.lab
  - compute-2.ocp.openstack.lab
--
placement:
  hosts:
  - compute-0.ocp.openstack.lab
  - compute-1.ocp.openstack.lab
  - compute-2.ocp.openstack.lab

And it seems to match the applied spec:

[root@compute-0 specs]# ceph orch ls --export | grep hosts -A 3
  hosts:
  - compute-0.ocp.openstack.lab
  - compute-1.ocp.openstack.lab
  - compute-2.ocp.openstack.lab
--
  hosts:
  - compute-0.ocp.openstack.lab
  - compute-1.ocp.openstack.lab
  - compute-2.ocp.openstack.lab
--
  hosts:
  - compute-0.ocp.openstack.lab
  - compute-1.ocp.openstack.lab
  - compute-2.ocp.openstack.lab
--
  hosts:
  - compute-0.ocp.openstack.lab
  - compute-1.ocp.openstack.lab
  - compute-2.ocp.openstack.lab
--
  hosts:
  - compute-0.ocp.openstack.lab
  - compute-1.ocp.openstack.lab
  - compute-2.ocp.openstack.lab

Some more status data, to ensure services are all good:

NAME                                     HOST                         PORTS              STATUS         REFRESHED  AGE  MEM USE  MEM LIM  VERSION           IMAGE ID      CONTAINER ID  
crash.compute-0                          compute-0.ocp.openstack.lab                     running (27m)    10m ago  27m    6899k        -  18.2.1-229.el9cp  3fd804e38f5b  5a1e264a1b8e  
crash.compute-1                          compute-1.ocp.openstack.lab                     running (26m)    10m ago  26m    6899k        -  18.2.1-229.el9cp  3fd804e38f5b  25ab5956e3c8  
crash.compute-2                          compute-2.ocp.openstack.lab                     running (24m)     9m ago  24m    6899k        -  18.2.1-229.el9cp  3fd804e38f5b  ebd1fb107a8a  
haproxy.rgw.default.compute-0.rvzvwe     compute-0.ocp.openstack.lab  *:8080,8999        running (23m)    10m ago  23m    7884k        -  2.4.22-f8e3218    9824f391cf33  357e3867cec2  
keepalived.rgw.default.compute-0.phszxy  compute-0.ocp.openstack.lab                     running (23m)    10m ago  23m    1820k        -  2.2.8             50a7c3295bb6  212207ea130b  
mds.cephfs.compute-0.wtomnd              compute-0.ocp.openstack.lab                     running (22m)    10m ago  22m    19.6M        -  18.2.1-229.el9cp  3fd804e38f5b  b04489a0b8ad  
mds.cephfs.compute-1.tcmatt              compute-1.ocp.openstack.lab                     running (22m)    10m ago  22m    15.6M        -  18.2.1-229.el9cp  3fd804e38f5b  38999c2a6d2d  
mds.cephfs.compute-2.fqozrn              compute-2.ocp.openstack.lab                     running (22m)     9m ago  22m    15.2M        -  18.2.1-229.el9cp  3fd804e38f5b  3310ccacba94  
mgr.compute-0.qtxjox                     compute-0.ocp.openstack.lab  *:9283,8765        running (28m)    10m ago  28m     494M        -  18.2.1-229.el9cp  3fd804e38f5b  b2b73082e14d  
mgr.compute-1.gxygzd                     compute-1.ocp.openstack.lab  *:8765             running (24m)    10m ago  24m     444M        -  18.2.1-229.el9cp  3fd804e38f5b  f5ffb7941ec2  
mgr.compute-2.ioeinc                     compute-2.ocp.openstack.lab  *:8765             running (24m)     9m ago  24m     444M        -  18.2.1-229.el9cp  3fd804e38f5b  82262fdf99eb  
mon.compute-0                            compute-0.ocp.openstack.lab                     running (28m)    10m ago  28m    50.9M    2048M  18.2.1-229.el9cp  3fd804e38f5b  66219c42805b  
mon.compute-1                            compute-1.ocp.openstack.lab                     running (24m)    10m ago  24m    43.3M    2048M  18.2.1-229.el9cp  3fd804e38f5b  7168aa2be477  
mon.compute-2                            compute-2.ocp.openstack.lab                     running (24m)     9m ago  24m    41.3M    2048M  18.2.1-229.el9cp  3fd804e38f5b  f4b742cf2656  
osd.0                                    compute-1.ocp.openstack.lab                     running (25m)    10m ago  25m    88.5M    4096M  18.2.1-229.el9cp  3fd804e38f5b  68b3f76e0318  
osd.1                                    compute-0.ocp.openstack.lab                     running (25m)    10m ago  25m    88.1M    4096M  18.2.1-229.el9cp  3fd804e38f5b  ec061c707acd  
osd.2                                    compute-2.ocp.openstack.lab                     running (24m)     9m ago  24m    77.9M    4096M  18.2.1-229.el9cp  3fd804e38f5b  cbff9816ae66  
rgw.rgw.compute-0.lepjhf                 compute-0.ocp.openstack.lab  172.18.0.102:8082  running (23m)    10m ago  23m    91.6M        -  18.2.1-229.el9cp  3fd804e38f5b  d42da69e2285  
rgw.rgw.compute-1.rchsnk                 compute-1.ocp.openstack.lab  172.18.0.100:8082  running (23m)    10m ago  23m    94.0M        -  18.2.1-229.el9cp  3fd804e38f5b  259132a6fac7  
rgw.rgw.compute-2.lchbql                 compute-2.ocp.openstack.lab  172.18.0.101:8082  running (23m)     9m ago  23m    92.0M        -  18.2.1-229.el9cp  3fd804e38f5b  93f8a310b76c  

Note: this is from my va-hci deployment, I'll do the same checks on the delta-v6. But since it's hitting tempest, I'd expect it to be all happy as well.

@fultonj
Copy link
Contributor

fultonj commented Aug 7, 2024

Looks promising. Let's plan to get this merged early next week. Thank you @cjeanner

@cjeanner cjeanner force-pushed the libvirt_manager/use-fqdn branch from a4ee079 to 85bca7c Compare August 8, 2024 07:40
There is a need to get unique hostnames for the VM that will be mounting
some shared volumes in some infrastructure.

Allowing to get FQDN instead of shortnames allows to change the domain
of the VMs, making FQDN actually unique for each run.

The way we will leverage that feature is still unknown, but the support
will be in place whenever we need it.

A possibility would be to set the cifmw_devscripts_config.base_domain to
a unique value, or directly set the cifmw_reproducer_domain to whatever
value the user wants.
@cjeanner cjeanner force-pushed the libvirt_manager/use-fqdn branch from 85bca7c to 5e1abd0 Compare August 9, 2024 06:14
@cjeanner
Copy link
Contributor Author

cjeanner commented Aug 9, 2024

Notes:

Otherwise, it seems to be a plain success. There might raise some issues for some cases where tests/others were using some inconsistent patterns, "hoping" it would work (that was the case for neutron whitebox), but we should have squashed most of the issues beforehand.

@fultonj
Copy link
Contributor

fultonj commented Aug 12, 2024

The patches under https://review.opendev.org/q/topic:%22use-startswith%22 have merged.

I assume the plan to run tests again to confirm that the merges produced the expected results and then if it's green to merge this patch.

@tosky
Copy link
Contributor

tosky commented Aug 12, 2024

The patches under review.opendev.org/q/topic:%22use-startswith%22 have merged.

I assume the plan to run tests again to confirm that the merges produced the expected results and then if it's green to merge this patch.

There may be other issues on the nova side with live migration, I was told.

@cjeanner
Copy link
Contributor Author

Leading to far too many issues with FQDN inconsistency in multi-NIC environment. With the DCN and spine/leaf case, we can't reliably generate things properly.

Dropping this PR, already lost far too much time on it with little to no benefit overall. To get unique hostnames, I'll modify the libvirt_manager to introduce some random string in the shortname, and correct the various matches/extractions to take that into account.

@cjeanner cjeanner closed this Aug 13, 2024
@cjeanner cjeanner deleted the libvirt_manager/use-fqdn branch August 13, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants