[release-4.20] Backport noOLM Gateway API test coverage and upgrade tests#31322
[release-4.20] Backport noOLM Gateway API test coverage and upgrade tests#31322gcs278 wants to merge 6 commits into
Conversation
Add upgrade test validating Gateway API migration from OLM-based Istio to CIO-managed Sail Library during 4.21 to 4.22 upgrades. Setup creates Gateway/HTTPRoute with OLM provisioning and tests connectivity. Test validates migration: Gateway remains programmed, Istiod running, Istio CRDs stay OLM-managed, GatewayClass has CIO finalizer, Istio CR deleted, subscription persists. Teardown cleans up all resources. Cherry-picked from: cf1f826 openshift#30897
…ip logic The Gateway API upgrade test was calling g.Skip() from Setup(), which runs inside a goroutine managed by the disruption framework. Since g.Skip() panics and Ginkgo can only recover panics inside leaf nodes, this caused unrecoverable panics on IPv6/dual-stack, OKD, and unsupported platform clusters. Implement the upgrades.Skippable interface with a Skip() method that the disruption framework calls before Setup, avoiding the goroutine panic. Refactor checkPlatformSupportAndGetCapabilities into shouldSkipGatewayAPITests (safe outside Ginkgo nodes) and getPlatformCapabilities (returns LB/DNS support). https://redhat.atlassian.net/browse/OCPBUGS-83267 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Cherry-picked from: 8ef51c3 openshift#31000
The Gateway API controller tests tracked Gateways in a shared in-memory gateways slice, deleting them during AfterEach cleanup. However, openshift-tests distributes tests across separate parallel worker processes. The annotation-based checkAllTestsDone coordination works correctly because annotations are stored on the cluster-scoped GatewayClass, but the gateways slice is not shared across processes. The process that runs the final AfterEach cleanup has an empty gateways slice, so it deletes the GatewayClass and istiod but never deletes the Gateways created by other processes. This leaves gateway deployments orphaned on the cluster. As a secondary issue, even when gateways were deleted, the GatewayClass and istiod were removed without waiting for the gateway proxy deployments to be fully cleaned up by GC. Since the deployments have an owner reference to the Gateway (not a finalizer), the cascade deletion is asynchronous, creating a race where gateway pods lose their control plane and crash-loop. Fix both issues by cleaning up gateways at the individual test level using defer deleteGateway, which deletes the Gateway and waits for its proxy deployment to be removed by GC. Add deleteGateway and waitForGatewayDeploymentDeletion helpers shared by both the controller tests and the upgrade test Teardown. Cleanup errors now hard fail to surface leftover resources immediately rather than causing confusing downstream test failures. https://redhat.atlassian.net/browse/OCPBUGS-83281 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: Grant Spence <gspence@redhat.com> Co-Authored-By: Ishmam Amin <iamin@redhat.com> Cherry-picked from: 3f8a12d openshift#31023
Cherry-picked from: e29073f openshift#31023
Cherry-picked from: ca41c36 openshift#31023
Add retry logic to markTestDone to handle optimistic locking conflicts when updating GatewayClass annotations. The CIO actively manages the GatewayClass (updating conditions, status, finalizers) which can cause 409 Conflict errors when tests try to update annotations. Using RetryOnConflict ensures the test automatically retries with the latest resourceVersion when concurrent updates occur. Fixes flake: Operation cannot be fulfilled on gatewayclasses.gateway.networking.k8s.io "openshift-default": the object has been modified; please apply your changes to the latest version and try again https://redhat.atlassian.net/browse/OCPBUGS-81751 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Cherry-picked from: 8e4e43a openshift#30964
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
@gcs278: This pull request references Jira Issue OCPBUGS-88295, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-82146, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-78330, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. This pull request references Jira Issue OCPBUGS-85550, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@gcs278: This pull request references Jira Issue OCPBUGS-88295, which is invalid:
Comment This pull request references Jira Issue OCPBUGS-82146, which is invalid:
Comment This pull request references Jira Issue OCPBUGS-78330, which is invalid:
Comment This pull request references Jira Issue OCPBUGS-85550, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gcs278 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@gcs278: No Jira issue is referenced in the title of this pull request. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@gcs278: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Backport of Gateway API noOLM (Sail Library) test coverage and upgrade tests to release-4.20, as part of the Sail Library backport (NE-2286). This provides full test coverage for the
GatewayAPIWithoutOLMfeature gate, including OLM-to-Sail-Library migration upgrade testing, test flake fixes, and parallel worker cleanup fixes.Depends on #31262
Cherry-picked PRs