Skip to content

[release-4.20] Backport noOLM Gateway API test coverage and upgrade tests#31322

Open
gcs278 wants to merge 6 commits into
openshift:release-4.20from
gcs278:backport-noOLM-tests-4.20
Open

[release-4.20] Backport noOLM Gateway API test coverage and upgrade tests#31322
gcs278 wants to merge 6 commits into
openshift:release-4.20from
gcs278:backport-noOLM-tests-4.20

Conversation

@gcs278

@gcs278 gcs278 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Summary

Backport of Gateway API noOLM (Sail Library) test coverage and upgrade tests to release-4.20, as part of the Sail Library backport (NE-2286). This provides full test coverage for the GatewayAPIWithoutOLM feature gate, including OLM-to-Sail-Library migration upgrade testing, test flake fixes, and parallel worker cleanup fixes.

Depends on #31262

Cherry-picked PRs

PR Title Type
#30897 NE-2561: Add Gateway API OLM to noOLM migration upgrade test Feature
#30964 OCPBUGS-81751: Fix GatewayClass update conflict in markTestDone Bug fix
#31000 OCPBUGS-83267: Use upgrades.Skippable for Gateway API upgrade test skip logic Bug fix
#31023 OCPBUGS-83281: Fix Gateway cleanup in parallel e2e test workers Bug fix

gcs278 and others added 6 commits June 22, 2026 16:52
Add upgrade test validating Gateway API migration from OLM-based Istio
to CIO-managed Sail Library during 4.21 to 4.22 upgrades.

Setup creates Gateway/HTTPRoute with OLM provisioning and tests
connectivity. Test validates migration: Gateway remains programmed,
Istiod running, Istio CRDs stay OLM-managed, GatewayClass has CIO
finalizer, Istio CR deleted, subscription persists. Teardown cleans
up all resources.

Cherry-picked from: cf1f826
openshift#30897
…ip logic

The Gateway API upgrade test was calling g.Skip() from Setup(), which
runs inside a goroutine managed by the disruption framework. Since
g.Skip() panics and Ginkgo can only recover panics inside leaf nodes,
this caused unrecoverable panics on IPv6/dual-stack, OKD, and
unsupported platform clusters.

Implement the upgrades.Skippable interface with a Skip() method that
the disruption framework calls before Setup, avoiding the goroutine
panic. Refactor checkPlatformSupportAndGetCapabilities into
shouldSkipGatewayAPITests (safe outside Ginkgo nodes) and
getPlatformCapabilities (returns LB/DNS support).

https://redhat.atlassian.net/browse/OCPBUGS-83267

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cherry-picked from: 8ef51c3
openshift#31000
The Gateway API controller tests tracked Gateways in a shared
in-memory gateways slice, deleting them during AfterEach cleanup.
However, openshift-tests distributes tests across separate parallel
worker processes. The annotation-based checkAllTestsDone coordination
works correctly because annotations are stored on the cluster-scoped
GatewayClass, but the gateways slice is not shared across processes.
The process that runs the final AfterEach cleanup has an empty
gateways slice, so it deletes the GatewayClass and istiod but never
deletes the Gateways created by other processes. This leaves gateway
deployments orphaned on the cluster.

As a secondary issue, even when gateways were deleted, the GatewayClass
and istiod were removed without waiting for the gateway proxy
deployments to be fully cleaned up by GC. Since the deployments have
an owner reference to the Gateway (not a finalizer), the cascade
deletion is asynchronous, creating a race where gateway pods lose
their control plane and crash-loop.

Fix both issues by cleaning up gateways at the individual test level
using defer deleteGateway, which deletes the Gateway and waits for
its proxy deployment to be removed by GC. Add deleteGateway and
waitForGatewayDeploymentDeletion helpers shared by both the controller
tests and the upgrade test Teardown. Cleanup errors now hard fail to
surface leftover resources immediately rather than causing confusing
downstream test failures.

https://redhat.atlassian.net/browse/OCPBUGS-83281

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Grant Spence <gspence@redhat.com>
Co-Authored-By: Ishmam Amin <iamin@redhat.com>

Cherry-picked from: 3f8a12d
openshift#31023
Add retry logic to markTestDone to handle optimistic locking conflicts
when updating GatewayClass annotations. The CIO actively manages the
GatewayClass (updating conditions, status, finalizers) which can cause
409 Conflict errors when tests try to update annotations.

Using RetryOnConflict ensures the test automatically retries with the
latest resourceVersion when concurrent updates occur.

Fixes flake:
  Operation cannot be fulfilled on gatewayclasses.gateway.networking.k8s.io
  "openshift-default": the object has been modified; please apply your
  changes to the latest version and try again

https://redhat.atlassian.net/browse/OCPBUGS-81751

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Cherry-picked from: 8e4e43a
openshift#30964
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 22, 2026
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9cb97e43-1c4e-46a1-ae97-49927a0c14e6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci-robot

openshift-ci-robot commented Jun 22, 2026

Copy link
Copy Markdown

@gcs278: This pull request references Jira Issue OCPBUGS-88295, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.21.z" instead
  • expected dependent Jira Issue OCPBUGS-86778 to target a version in 4.21.0, 4.21.z, but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references Jira Issue OCPBUGS-82146, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.21.z" instead
  • expected dependent Jira Issue OCPBUGS-76609 to target a version in 4.21.0, 4.21.z, but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references Jira Issue OCPBUGS-78330, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.21.z" instead
  • expected dependent Jira Issue OCPBUGS-88300 to target a version in 4.21.0, 4.21.z, but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

This pull request references Jira Issue OCPBUGS-85550, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.21.z" instead
  • expected dependent Jira Issue OCPBUGS-88302 to target a version in 4.21.0, 4.21.z, but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Summary

Backport of Gateway API noOLM test coverage and upgrade tests from release-4.21 (#31232) to release-4.20. This adds OLM-to-Sail-Library migration upgrade testing, test flake fixes, and parallel worker cleanup fixes for the GatewayAPIWithoutOLM feature gate.

This is part of an approved SBAR to backport the Sail Library (noOLM) from 4.22 to 4.19–4.21.

Background

Gateway API on OCP 4.19–4.21 uses the Cluster Ingress Operator (CIO) to install Istio via OLM (OSSM operator). This path has several critical bugs:

  • OCPBUGS-88295: OSSM z-stream upgrades are blocked, preventing CVE fixes from being delivered
  • OCPBUGS-82146: OLM-related install failures
  • OCPBUGS-78330: Hardcoded catalog source breaks disconnected environments
  • OCPBUGS-85550: Gateway API fails on clusters without Marketplace capability

In OCP 4.22, NE-2286 replaced OLM with the Sail Library — CIO now installs Istio directly via embedded Helm charts. This feature shipped as GA behind the GatewayAPIWithoutOLM feature gate.

Cherry-picked PRs

PR Title
#30897 NE-2292: Add Gateway API OLM to NO-OLM migration upgrade test
#31000 OCPBUGS-83267: Use upgrades.Skippable for Gateway API upgrade test skip logic
#31023 OCPBUGS-83281: Fix Gateway cleanup in parallel e2e test workers
#30964 OCPBUGS-81751: Fix GatewayClass update conflict in markTestDone

Rollout Plan

Phase 1 — Land code (gate OFF)

Phase 2 — TechPreview soak

Phase 3 — GA promotion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot

Copy link
Copy Markdown

@gcs278: This pull request references Jira Issue OCPBUGS-88295, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.21.z" instead
  • expected dependent Jira Issue OCPBUGS-86778 to target a version in 4.21.0, 4.21.z, but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

This pull request references Jira Issue OCPBUGS-82146, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.21.z" instead
  • expected dependent Jira Issue OCPBUGS-76609 to target a version in 4.21.0, 4.21.z, but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

This pull request references Jira Issue OCPBUGS-78330, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.21.z" instead
  • expected dependent Jira Issue OCPBUGS-88300 to target a version in 4.21.0, 4.21.z, but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

This pull request references Jira Issue OCPBUGS-85550, which is invalid:

  • expected the bug to target either version "4.20." or "openshift-4.20.", but it targets "4.21.z" instead
  • expected dependent Jira Issue OCPBUGS-88302 to target a version in 4.21.0, 4.21.z, but it targets "4.22.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

Summary

Backport of Gateway API noOLM test coverage and upgrade tests from release-4.21 (#31232) to release-4.20. This adds OLM-to-Sail-Library migration upgrade testing, test flake fixes, and parallel worker cleanup fixes for the GatewayAPIWithoutOLM feature gate.

This is part of an approved SBAR to backport the Sail Library (noOLM) from 4.22 to 4.19–4.21.

Background

Gateway API on OCP 4.19–4.21 uses the Cluster Ingress Operator (CIO) to install Istio via OLM (OSSM operator). This path has several critical bugs:

  • OCPBUGS-88295: OSSM z-stream upgrades are blocked, preventing CVE fixes from being delivered
  • OCPBUGS-82146: OLM-related install failures
  • OCPBUGS-78330: Hardcoded catalog source breaks disconnected environments
  • OCPBUGS-85550: Gateway API fails on clusters without Marketplace capability

In OCP 4.22, NE-2286 replaced OLM with the Sail Library — CIO now installs Istio directly via embedded Helm charts. This feature shipped as GA behind the GatewayAPIWithoutOLM feature gate.

Cherry-picked PRs

PR Title
#30897 NE-2292: Add Gateway API OLM to NO-OLM migration upgrade test
#31000 OCPBUGS-83267: Use upgrades.Skippable for Gateway API upgrade test skip logic
#31023 OCPBUGS-83281: Fix Gateway cleanup in parallel e2e test workers
#30964 OCPBUGS-81751: Fix GatewayClass update conflict in markTestDone

Rollout Plan

Phase 1 — Land code (gate OFF)

Phase 2 — TechPreview soak

Phase 3 — GA promotion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from p0lyn0mial and rfredette June 22, 2026 21:00
@openshift-ci

openshift-ci Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gcs278
Once this PR has been reviewed and has the lgtm label, please assign dennisperiquet for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gcs278 gcs278 changed the title [release-4.20] OCPBUGS-88295, OCPBUGS-82146, OCPBUGS-78330, OCPBUGS-85550: Backport noOLM Gateway API test coverage and upgrade tests [release-4.20] Backport noOLM Gateway API test coverage and upgrade tests Jun 22, 2026
@openshift-ci-robot openshift-ci-robot removed jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jun 22, 2026
@openshift-ci-robot

Copy link
Copy Markdown

@gcs278: No Jira issue is referenced in the title of this pull request.
To reference a jira issue, add 'XYZ-NNN:' to the title of this pull request and request another refresh with /jira refresh.

Details

In response to this:

Summary

Backport of Gateway API noOLM (Sail Library) test coverage and upgrade tests to release-4.20, as part of the Sail Library backport (NE-2286). This provides full test coverage for the GatewayAPIWithoutOLM feature gate, including OLM-to-Sail-Library migration upgrade testing, test flake fixes, and parallel worker cleanup fixes.

Depends on #31262

Cherry-picked PRs

PR Title Type
#30897 NE-2561: Add Gateway API OLM to noOLM migration upgrade test Feature
#30964 OCPBUGS-81751: Fix GatewayClass update conflict in markTestDone Bug fix
#31000 OCPBUGS-83267: Use upgrades.Skippable for Gateway API upgrade test skip logic Bug fix
#31023 OCPBUGS-83281: Fix Gateway cleanup in parallel e2e test workers Bug fix

Rollout Plan

Phase 1 — Land code (gate OFF)

Phase 2 — TechPreview soak

Phase 3 — GA promotion

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@gcs278: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn-upgrade-rollback 774076f link false /test e2e-aws-ovn-upgrade-rollback

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants