Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the steps to reboot the computes after update. #2587

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sathlan
Copy link
Contributor

@sathlan sathlan commented Dec 5, 2024

This sequence implements reboot of the compute nodes after the
update. By default it's not run and cifmw_update_reboot_test must be
set to true to activate it.

We have one instance created. If the hypervisor being rebooted has
the instance that instance will be live-migrated to another hypervisor
before the reboot and migrated back to that original hypervisor after
the reboot.

Some basic sanity checks are performed after the reboot and before the
migration back to ensure that the necessary services are up and
running.

During the reboot we start two scripts. One monitors and log the
reboot of the hypervisors. The other log where the instance is
currently running. The log files can be found in
~/ci-framework-data/tests/update/ in monitor_servers.log and
monitor_vm_placement.log respectively.

A note about node evacuation. We are still using node evaction from
the nova cli. This command has not been ported to the openstack
cli. There's a discussion about it on launchpad.

Also, we do the evacuation only if there are more than one hypervisor
available. When only one compute is available we stop and and after
reboot, we just restart the instance.

The official documentation mention only the live-migration path, but
as we also use the live-migration in the test sequence that part is
covered. We still expect customer to use the nova cli as it's way
more user friendly and is still currently working.

Closes: https://issues.redhat.com/browse/OSPRH-8937

@github-actions github-actions bot marked this pull request as draft December 5, 2024 10:25
Copy link

github-actions bot commented Dec 5, 2024

Thanks for the PR! ❤️
I'm marking it as a draft, once your happy with it merging and the PR is passing CI, click the "Ready for review" button below.

@sathlan
Copy link
Contributor Author

sathlan commented Dec 5, 2024

Current tested with ping test running in the background and found not loss of connectivity.

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/4f12a60d8dc64dad92d4f7b5f5bec990

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 40m 14s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 19m 06s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 28m 19s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 44s
cifmw-pod-pre-commit FAILURE in 8m 14s
✔️ build-push-container-cifmw-client SUCCESS in 21m 46s
cifmw-molecule-update FAILURE in 5m 04s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/bf06493eaeea430c898bf25520dfdd04

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 32s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 15m 44s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 31m 55s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 7m 45s
cifmw-pod-pre-commit FAILURE in 7m 28s
✔️ build-push-container-cifmw-client SUCCESS in 21m 20s
cifmw-molecule-update FAILURE in 5m 20s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/caf50b2e762a4aaeb3326c9399dffd15

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 21m 02s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 18m 47s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 28m 48s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 43s
cifmw-pod-pre-commit FAILURE in 7m 37s
✔️ build-push-container-cifmw-client SUCCESS in 36m 42s
cifmw-molecule-update FAILURE in 4m 29s

@sathlan sathlan force-pushed the update-reboot branch 11 times, most recently from c056d3c to 85e4367 Compare December 18, 2024 14:14
@sathlan sathlan added enhancement New feature or request and removed do-not-merge/work-in-progress labels Dec 19, 2024
@sathlan sathlan marked this pull request as ready for review December 19, 2024 08:45
@sathlan sathlan requested a review from a team as a code owner December 19, 2024 08:45
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/b30a7822761f41f3890d3b541e5e5bd6

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 42m 25s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 17m 09s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 30m 30s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 7m 16s
cifmw-pod-pre-commit FAILURE in 6m 30s
✔️ build-push-container-cifmw-client SUCCESS in 37m 11s
✔️ cifmw-molecule-update SUCCESS in 5m 14s

@sathlan sathlan force-pushed the update-reboot branch 2 times, most recently from 5b95216 to 0217325 Compare January 13, 2025 14:36
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/87194aa88e3d4b5e90652416f40b57cb

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 44m 32s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 18m 33s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 22m 16s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 00s
cifmw-pod-pre-commit FAILURE in 7m 41s
✔️ build-push-container-cifmw-client SUCCESS in 36m 36s
✔️ cifmw-molecule-update SUCCESS in 5m 23s

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/0030f67185a246beaa6641ac614816e3

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 43m 19s
podified-multinode-edpm-deployment-crc POST_FAILURE in 1h 14m 15s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 29m 16s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 52s
✔️ cifmw-pod-pre-commit SUCCESS in 7m 58s
✔️ build-push-container-cifmw-client SUCCESS in 22m 17s
✔️ cifmw-molecule-update SUCCESS in 5m 35s

@ciecierski
Copy link
Contributor

Recheck

Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/f6409174cbdc4deba2df5666caf90894

✔️ openstack-k8s-operators-content-provider SUCCESS in 1h 25m 38s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 11m 48s
cifmw-crc-podified-edpm-baremetal RETRY_LIMIT in 25m 38s
✔️ noop SUCCESS in 0s
✔️ cifmw-pod-ansible-test SUCCESS in 8m 02s
✔️ cifmw-pod-pre-commit SUCCESS in 8m 04s
✔️ build-push-container-cifmw-client SUCCESS in 22m 22s
✔️ cifmw-molecule-update SUCCESS in 4m 29s

@jistr
Copy link
Contributor

jistr commented Jan 30, 2025

LGTM

@sathlan
Copy link
Contributor Author

sathlan commented Jan 30, 2025

recheck

This sequence implements reboot of the compute nodes after the
update. By default it's not run and `cifmw_update_reboot_test` must be
set to true to activate it.

We have one instance created.  If the hypervisor being rebooted has
the instance that instance will be live-migrated to another hypervisor
before the reboot and migrated back to that original hypervisor after
the reboot.

Some basic sanity checks are performed after the reboot and before the
migration back to ensure that the necessary services are up and
running.

During the reboot we start two scripts. One monitors and log the
reboot of the hypervisors.  The other log where the instance is
currently running.  The log files can be found in
`~/ci-framework-data/tests/update/` in `monitor_servers.log` and
`monitor_vm_placement.log` respectively.

A note about node evacuation.  We are still using node evaction from
the nova cli.  This command has not been ported to the openstack
cli. There's a discussion about it [on launchpad](https://bugs.launchpad.net/python-openstackclient/+bug/2055552).

Also, we do the evacuation only if there are more than one hypervisor
available.  When only one compute is available we stop and and after
reboot, we just restart the instance.

The official documentation mention only the live-migration path, but
as we also use the live-migration in the test sequence that part is
covered.  We still expect customer to use the nova cli as it's way
more user friendly and is still currently working.

Closes: https://issues.redhat.com/browse/OSPRH-8937
@jistr
Copy link
Contributor

jistr commented Jan 30, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm label Jan 30, 2025
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/70768976357c4ef3aa9e64c478f55901

✔️ openstack-k8s-operators-content-provider SUCCESS in 2h 47m 51s
✔️ podified-multinode-edpm-deployment-crc SUCCESS in 1h 14m 51s
✔️ cifmw-crc-podified-edpm-baremetal SUCCESS in 1h 35m 04s
✔️ noop SUCCESS in 0s
cifmw-pod-ansible-test FAILURE in 6m 01s
cifmw-pod-pre-commit FAILURE in 4m 52s
✔️ build-push-container-cifmw-client SUCCESS in 17m 27s
✔️ cifmw-molecule-update SUCCESS in 4m 21s

@ciecierski
Copy link
Contributor

recheck cifmw-pod-ansible-test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants