Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.18] OCPBUGS-48710: DownStream Merge Sync from 4.19 [01-22-2025] #2422

Open
wants to merge 59 commits into
base: release-4.18
Choose a base branch
from

Conversation

jluhrsen
Copy link
Contributor

πŸ“‘ Description

Fixes #

Additional Information for reviewers

βœ… Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

oshoval and others added 30 commits December 16, 2024 13:32
ManagedTap binding name was decided to be l2bridge,
use it accordingly.

Signed-off-by: Or Shoval <[email protected]>
Handle host-network pods as default network.
Don't return per-pod errors on startup.
Remove nadController from UDNHostIsolationManager as we don't use it
anymore to find pod's UDN based on NADs that exist in the namespace.

Signed-off-by: Nadia Pinaeva <[email protected]>
This code isnt being used anymore. We dont expect users
to upgrade directly from code which contained the legacy LRPs,
therefore its safe to remove.

Signed-off-by: Martin Kennelly <[email protected]>
L2 UDN: EgressIP hosted by primary interface (`breth0`)
If EncapIP is configured, it means it is different from the node's
primary address. Do not update EncapIP when node's primary address
changes.

Signed-off-by: Yun Zhou <[email protected]>
Assign network ID from network manager running in cluster manager. The
network ID is included in NetInfo and annotated on the NAD along with
the network name. Network managers running in zone & node controllers
will read the network ID from the annotation to set it on NetInfo.

On startup, network manager running in cluster manager will read the
network IDs annotated on the nodes to cover for the upgrade scenario.
Network IDs will still be annotated on the nodes because this PR does
not transition all the code to use the network ID from the NetInfo
instead of the node annotation. That will have to be done progressively.

This have several benefits, among them:
- NetworkID is available sooner overall since we dont have to wait for
  all the nodes to be annotated
- No need to unmarshall the node annotation to get the network IDs, they
  are available in NetInfo
- No need to unmashall the NAD to get the network name, can be accessed
  directly from the annotation.

If a network is replaced with a different one with the same name, the
network ID is reused as the respective network controller will not start
as the previous one is stopped and cleaned up so it shouldn't be a
problem.

Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Instead of considering managed VRFs those that follow the mp<id>-udn-vrf
naming template, use the table number: those vrfs associated to a table
within our reserved block of table numbers are managed by us. The block
right now is anything higher than RoutingTableIDStart (1000). This
allows to manage VRFs with any name which is desirable if the name is
going to be exposed through BGP.

Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Anticipating that these VRF names are going to be exposed through BGP,
we should to use friendlier names for our VRFs. The most natural name to
use is the network name. Thus giving a cluster UDN a name below 15
characters that matches an already existing VRF not managed by ovn-k
will fail. This is considered an admin problem and not an ovn-k problem
for now.

Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Was causing deadlocks in unit tests

Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
…heir subcontrollers

Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
Assuming that there is three types of controllers, being: network
agnostic, network aware and network specific; we were already notifying
network specific controllers of network changes. But network aware
controllers, controllers for which we have a single instance capable of
managing multiple networks, had no code path to be informed of netwokr
changes.

This commit adds a code path for that and makes the RouteAdvertisments
controller aware of network changes.

Changed ClusterManager to be the controller manager for cluster manager
instead of secondaryNetworkClusterManager. It just makes more sense that
way sice ClusterManager is the top level manager.

Signed-off-by: Jaime CaamaΓ±o Ruiz <[email protected]>
On CUDN cleanup is inconsistent as we see some flaky tests due to CUDN
"already exist" errors, implying object are not actually deleted.

Wait for CUDN object be gone when deleted

Signed-off-by: Or Mergi <[email protected]>
CUDN is cluster-scoped object, in case tests running in parallel,
having random names avoids conflicting with other tests.

Use random metadata.name for CUDN objects.

The "isolates overlapping CIDRs" tests create objects based on the
'red' and 'blue' variables, including CUDN objects.
Change the tests CUDN creation use random names and update the given
'networkAttachmentConfigParams' with the new generated name.
Update 'red' & 'blue' vaiables with the generated name, carried by
'networkAttachmentConfigParams' (netConfig.name).

The pod2Egress tests asserts on the CUDN object name given by 'userDefinedNetworkName'.
In practice the tests netConfigParam.name is userDefinedNetworkName.
Change the assertion to check the given netConfigParam.

Signed-off-by: Or Mergi <[email protected]>
Reconcile RouteAdvertisements in cluster manager
Add missing enum validation for RouteAdvertisements
The NetPol test checks assigned pod IP only against IPv4 subnet
which would fail on IPv6 only cluster. This commit fixes it by
checking on all valid CIDRs.

Signed-off-by: Periyasamy Palanisamy <[email protected]>
In an unlikely scenario where the service doesn't exist
and there was an issue getting the current active network
the code should not use the service object for the returned error.

Signed-off-by: Patryk Diak <[email protected]>
trozet and others added 9 commits January 10, 2025 15:25
Services controller: Handle startup failures
managedTap: Use the new bridge type name
The test may flake in case the condition is not populated on the
expected time.

Wrap the test assertion with Eventually, so that the test wait for
conditions to populate.

Signed-off-by: Or Mergi <[email protected]>
e2e,net-seg,udn: Wait for condition to populate
… CDN "node" switches

When default network controller (DNC) restarts and a sync is invoked,
it may not exclude transit switches that support different L3 networks
when attempting to cleanup stale Node switches.

Therefore, connectivity between Nodes will be impacted for Pods attached
to L3 networks because its transit switch maybe removed during DNC sync.

Add meta data to secondary networks transit switches in-order for them
to be skipped during cleanup of stale Node switches for CDN.

Signed-off-by: Martin Kennelly <[email protected]>
OVN DNC: ignore all transit switches when cleaning up stale Nodes OVN switch
SDN-4930, OCPBUGS-48622: Downstream Merge [01-15-2025]
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jan 22, 2025
@openshift-ci-robot
Copy link
Contributor

@jluhrsen: This pull request references Jira Issue OCPBUGS-48710, which is valid. The bug has been moved to the POST state.

7 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.18.0) matches configured target version for branch (4.18.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)
  • release note type set to "Release Note Not Required"
  • dependent bug Jira Issue OCPBUGS-48709 is in the state Verified, which is one of the valid states (MODIFIED, ON_QA, VERIFIED)
  • dependent Jira Issue OCPBUGS-48709 targets the "4.19.0" version, which is one of the valid target versions: 4.19.0
  • bug has dependents

Requesting review from QA contact:
/cc @anuragthehatter

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

πŸ“‘ Description

Fixes #

Additional Information for reviewers

βœ… Checks

  • My code requires changes to the documentation
  • if so, I have updated the documentation as required
  • My code requires tests
  • if so, I have added and/or updated the tests as required
  • All the tests have passed in the CI

How to verify it

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jluhrsen
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-ipv6-techpreview
/test e2e-aws-ovn-hypershift-conformance-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-metal-ipi-ovn-dualstack-techpreview
/test e2e-vsphere-ovn-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal-ipi-ovn-techpreview
/test openshift-e2e-gcp-ovn-techpreview-upgrade
/payload 4.18 ci blocking
/payload 4.18 nightly blocking

Copy link
Contributor

openshift-ci bot commented Jan 22, 2025

@jluhrsen: trigger 4 job(s) of type blocking for the ci release of OCP 4.18

  • periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.18-e2e-gcp-ovn-upgrade
  • periodic-ci-openshift-hypershift-release-4.18-periodics-e2e-aws-ovn

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0c1ed480-d8fc-11ef-83fb-f2dc91b0d740-0

trigger 13 job(s) of type blocking for the nightly release of OCP 4.18

  • periodic-ci-openshift-release-master-ci-4.18-e2e-aws-upgrade-ovn-single-node
  • periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-upgrade
  • periodic-ci-openshift-release-master-ci-4.18-upgrade-from-stable-4.17-e2e-gcp-ovn-rt-upgrade
  • periodic-ci-openshift-hypershift-release-4.18-periodics-e2e-aws-ovn-conformance
  • periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-serial
  • periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview
  • periodic-ci-openshift-release-master-ci-4.18-e2e-aws-ovn-techpreview-serial
  • periodic-ci-openshift-release-master-nightly-4.18-fips-payload-scan
  • periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-bm
  • periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6
  • periodic-ci-openshift-microshift-release-4.18-periodics-e2e-aws-ovn-ocp-conformance
  • periodic-ci-openshift-microshift-release-4.18-periodics-e2e-aws-ovn-ocp-conformance-serial

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/0c1ed480-d8fc-11ef-83fb-f2dc91b0d740-1

Copy link
Contributor

openshift-ci bot commented Jan 22, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jluhrsen
Once this PR has been reviewed and has the lgtm label, please assign abhat for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jluhrsen
Copy link
Contributor Author

jluhrsen commented Jan 23, 2025

lots of jobs having trouble pulling packages for the 'docker build'. hope it's ephemeral.
/retest

@jluhrsen
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-ipv6-techpreview
/test e2e-aws-ovn-hypershift-conformance-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-metal-ipi-ovn-dualstack-techpreview
/test e2e-vsphere-ovn-techpreview
/test e2e-aws-ovn-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal-ipi-ovn-techpreview
/test openshift-e2e-gcp-ovn-techpreview-upgrade

Copy link
Contributor

openshift-ci bot commented Jan 23, 2025

@jluhrsen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6-techpreview e49d6e1 link false /test e2e-metal-ipi-ovn-ipv6-techpreview
ci/prow/e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview e49d6e1 link false /test e2e-metal-ipi-ovn-dualstack-local-gateway-techpreview
ci/prow/e2e-vsphere-ovn-techpreview e49d6e1 link false /test e2e-vsphere-ovn-techpreview
ci/prow/e2e-gcp-ovn-techpreview e49d6e1 link true /test e2e-gcp-ovn-techpreview
ci/prow/security e49d6e1 link false /test security
ci/prow/e2e-metal-ipi-ovn-techpreview e49d6e1 link false /test e2e-metal-ipi-ovn-techpreview
ci/prow/e2e-metal-ipi-ovn-dualstack-techpreview e49d6e1 link false /test e2e-metal-ipi-ovn-dualstack-techpreview
ci/prow/e2e-azure-ovn-upgrade e49d6e1 link true /test e2e-azure-ovn-upgrade
ci/prow/e2e-aws-ovn-techpreview e49d6e1 link false /test e2e-aws-ovn-techpreview
ci/prow/e2e-azure-ovn-techpreview e49d6e1 link false /test e2e-azure-ovn-techpreview
ci/prow/e2e-aws-ovn-single-node-techpreview e49d6e1 link false /test e2e-aws-ovn-single-node-techpreview
ci/prow/e2e-aws-ovn-hypershift-conformance-techpreview e49d6e1 link false /test e2e-aws-ovn-hypershift-conformance-techpreview

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@jluhrsen
Copy link
Contributor Author

/testwith e2e-gcp-ovn-techpreview openshift/origin#29475

Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

@jluhrsen, testwith: Error processing request. ERROR:

could not determine job runs: requested job is invalid. needs to be formatted like: <org>/<repo>/<branch>/<variant?>/<job>. instead it was: e2e-gcp-ovn-techpreview

Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

@trozet: This PR was included in a payload test run from openshift/origin#29478
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-dualstack-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/e8a33730-daae-11ef-88db-b4ddea624002-0

Copy link
Contributor

openshift-ci bot commented Jan 24, 2025

@trozet: This PR was included in a payload test run from openshift/origin#29478
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/246b2bb0-daaf-11ef-958b-05096a7a0c3b-0

Copy link
Contributor

openshift-ci bot commented Jan 25, 2025

@trozet: This PR was included in a payload test run from openshift/origin#29478
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-ipv6-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4e696e90-db4a-11ef-97fc-196032f4373d-0

Copy link
Contributor

openshift-ci bot commented Jan 25, 2025

@trozet: This PR was included in a payload test run from openshift/origin#29478
trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ipi-ovn-dualstack-techpreview

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/66ba8f60-db4a-11ef-9d35-4def0cade6fb-0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.