-
Notifications
You must be signed in to change notification settings - Fork 86
feat: Enable ROSA cluster support in HCP backup/restore tests #1933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mgencur The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2765acd to
cfd509b
Compare
78ef121 to
d8296b4
Compare
|
|
3773521 to
db120c9
Compare
|
/retest |
|
IMO, the failure are not related to my changes. |
|
@mgencur we're close to getting CI fixed. Appologies we've had a lot of interrupts the last couple weeks. |
|
/retest |
58a205f to
aac86d1
Compare
WalkthroughAdds Makefile variables for HCP external testing and AWS profile propagation; introduces OADPDeploymentOperation with Deploy/Undeploy flows; extends e2e test harness to support Service Cluster kubeconfig, ROSA external mode, ManifestWork management, new HCP helper methods, and adjusted timeouts. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (3)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
Comment |
aac86d1 to
e31b5f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/e2e/lib/hcp/hcp.go (1)
728-741: Clarify the purpose of the config read-write cycle.The GetHostedClusterKubeconfig method now reads the kubeconfig, then immediately writes it back before building the rest.Config. This appears to be a no-op unless there's a side effect in the Load/Write cycle (e.g., normalization, validation, or default-filling).
If this is intentional to normalize the kubeconfig data, consider adding a comment explaining why. Otherwise, this could be simplified to directly call
buildConfigFromBytes(kubeconfigData).If the read-write cycle serves no purpose, simplify:
func (h *HCHandler) GetHostedClusterKubeconfig(hc *hypershiftv1.HostedCluster) (*rest.Config, error) { kubeconfigSecret := &corev1.Secret{} err := h.Client.Get(h.Ctx, types.NamespacedName{ Namespace: hc.Namespace, Name: hc.Status.KubeConfig.Name}, kubeconfigSecret) if err != nil { return nil, err } kubeconfigData := kubeconfigSecret.Data["kubeconfig"] - - config, err := ReadKubeconfigFromBytes(kubeconfigData) - if err != nil { - return nil, err - } - - modifiedBytes, err := clientcmd.Write(*config) - if err != nil { - return nil, err - } - - return buildConfigFromBytes(modifiedBytes) + return buildConfigFromBytes(kubeconfigData) }docs/developer/testing/TESTING.md (1)
113-128: LGTM! Clear documentation for ROSA testing.The new section properly documents:
- Required kubeconfig setup (management and service clusters)
- The destructive nature of the tests (deleting ManifestWork resources)
- A complete example with all necessary environment variables
Consider simplifying line 117 per the static analysis hint:
-* In order to break the guest cluster, the tests delete ManifestWork resources on the Service Cluster. +* To break the guest cluster, the tests delete ManifestWork resources on the Service Cluster.This is a minor style improvement and completely optional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (13)
Makefile(4 hunks)docs/developer/testing/TESTING.md(1 hunks)go.mod(1 hunks)tests/e2e/backup_restore_cli_suite_test.go(1 hunks)tests/e2e/backup_restore_suite_test.go(4 hunks)tests/e2e/e2e_suite_test.go(7 hunks)tests/e2e/hcp_backup_restore_suite_test.go(6 hunks)tests/e2e/hcp_external_cluster_backup_restore_suite_test.go(4 hunks)tests/e2e/lib/dpa_helpers.go(2 hunks)tests/e2e/lib/hcp/hcp.go(11 hunks)tests/e2e/lib/hcp/types.go(1 hunks)tests/e2e/scripts/aws_settings.sh(1 hunks)tests/e2e/virt_backup_restore_suite_test.go(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
docs/developer/testing/TESTING.mdtests/e2e/lib/dpa_helpers.goMakefiletests/e2e/virt_backup_restore_suite_test.gotests/e2e/backup_restore_cli_suite_test.gotests/e2e/backup_restore_suite_test.gotests/e2e/scripts/aws_settings.shtests/e2e/hcp_backup_restore_suite_test.gogo.modtests/e2e/lib/hcp/hcp.gotests/e2e/hcp_external_cluster_backup_restore_suite_test.gotests/e2e/e2e_suite_test.gotests/e2e/lib/hcp/types.go
🪛 LanguageTool
docs/developer/testing/TESTING.md
[style] ~117-~117: Consider a more concise word here.
Context: ...e Cluster with ManifestWork resources * In order to break the guest cluster, the tests dele...
(IN_ORDER_TO_PREMIUM)
🔇 Additional comments (27)
go.mod (1)
25-25: LGTM!The addition of the
open-cluster-management.io/apidependency is necessary to support the new ManifestWork-based operations for ROSA cluster lifecycle management introduced in this PR.Makefile (3)
68-74: LGTM! Well-documented parameters.The new HC_BACKUP_RESTORE_MODE, HC_NAME, and HC_NAMESPACE parameters are properly introduced with clear comments explaining their purpose and default values.
754-754: LGTM! Consistent with existing AWS profile configuration.The VSL_AWS_PROFILE parameter follows the established pattern of BSL_AWS_PROFILE and properly defaults to "default".
845-845: Documentation for SC_KUBECONFIG is adequate—no action required.The documentation in docs/developer/testing/TESTING.md clearly states that "SC_KUBECONFIG must point to the Service Cluster with ManifestWork resources" and provides an example showing its usage in the HC_BACKUP_RESTORE_MODE=external-rosa context. Users are adequately informed of the requirement and how to set it.
tests/e2e/lib/hcp/hcp.go (6)
390-475: LGTM! Proper parameterization of namespace.The DeployHCManifest function now correctly accepts
hcNamespaceas a parameter, making it more flexible for different deployment scenarios (default vs ROSA). The namespace is properly used throughout the function for all resource creations.
698-704: LGTM! Clean utility function.ReadKubeconfigFromBytes is a well-defined utility function that properly encapsulates kubeconfig loading logic.
755-767: LGTM! Clean namespace resolution.The GetManifestWorkNamespace method properly lists ManifestWorks and returns the namespace for the matching cluster ID.
769-818: LGTM! Comprehensive ManifestWork deletion with proper polling.The DeleteManifestWork method properly:
- Extracts the cluster ID from labels
- Resolves the ManifestWork namespace
- Deletes all three ManifestWork resources (main, workers, namespaces)
- Polls for deletion completion with timeout
- Handles not-found errors gracefully
820-877: LGTM! Well-structured backup functionality.The BackupManifestWork method properly:
- Creates timestamped subdirectories to avoid conflicts
- Sets APIVersion and Kind for proper resource serialization
- Uses proper file permissions (0755 for directories, 0644 for files)
- Respects TMP_DIR environment variable with sensible fallback
879-893: LGTM! Consistent deletion check pattern.IsManifestWorkDeleted follows the same pattern as IsHCDeleted and IsHCPDeleted, properly handling not-found errors and propagating other errors.
tests/e2e/backup_restore_cli_suite_test.go (1)
234-234: LGTM! Consistent with deployment operation refactoring.The change from
waitOADPReadinesstoNewOADPDeploymentOperationDefault().Deployaligns with the broader refactoring to use pluggable deployment strategies across the test suite.tests/e2e/virt_backup_restore_suite_test.go (1)
220-220: LGTM! Consistent with deployment operation refactoring.This change matches the refactoring pattern applied across all test suites to use the new deployment operation approach.
tests/e2e/scripts/aws_settings.sh (1)
43-43: LGTM! Proper parameterization of AWS profile.The change from hardcoded "default" to
$VSL_AWS_PROFILEmakes the snapshot location profile configurable and aligns with the existing$BSL_AWS_PROFILEpattern for backup locations.tests/e2e/lib/hcp/types.go (2)
135-138: LGTM! Good use of named constant.Introducing
Wait30Minas a named constant and using it forHCPBackupTimeoutimproves code maintainability and consistency with other timeout constants likeWait10Min.
143-148: LGTM! Proper extension for ROSA support.The addition of the
ClientServiceClusterfield toHCHandlerenables the new ManifestWork operations required for ROSA cluster management, while maintaining backward compatibility with existing code that doesn't use this field.tests/e2e/e2e_suite_test.go (2)
157-169: LGTM: Service Cluster kubeconfig initialization is well-structured.The conditional initialization properly:
- Constructs the config using clientcmd.BuildConfigFromFlags
- Preserves QPS/Burst settings from the main kubeconfig
- Installs the workv1 scheme for ManifestWork operations
- Creates a dedicated CR client for Service Cluster operations
240-244: LGTM: Dynamic deployment operation selection for teardown.The teardown correctly selects between Default and ROSA deployment operations based on the HCBackupRestoreMode, ensuring proper cleanup for each scenario.
tests/e2e/lib/dpa_helpers.go (2)
113-133: LGTM: Clean extraction of BSL spec construction.The BackupStorageLocationSpec() method properly encapsulates BSL configuration, enabling reuse in both Build() (line 79) and CreateBackupStorageLocation() (line 141).
135-203: LGTM: Idempotent BSL/VSL lifecycle methods.All four methods (Create/Delete for BSL and VSL) are properly idempotent:
- Create methods return nil on AlreadyExists
- Delete methods return nil on NotFound
- CreateVolumeSnapshotLocation validates SnapshotLocations presence before accessing the first element
This design supports safe retries and multiple test execution scenarios.
tests/e2e/backup_restore_suite_test.go (2)
46-70: LGTM: Well-designed deployment abstraction.The OADPDeploymentOperation struct provides clear separation between default HCP testing (DPA/VolumeSnapshotClass managed) and ROSA scenarios (BSL/VSL managed). The boolean flags make the deployment strategy explicit and testable.
72-116: LGTM: Deploy() method properly orchestrates resource creation.The deployment sequence is well-structured:
- Conditionally create DPA and wait for reconciliation
- Wait for Node Agent pods (when needed for KOPIA/CSIDataMover)
- Wait for Velero pod
- Conditionally create VolumeSnapshotClass for CSI scenarios
- Create BSL and wait for availability
- Conditionally create VSL
Note: The BSL availability check (lines 106-107) runs regardless of CreateBSL flag, which is correct since BSL exists in both default (via DPA) and ROSA (explicit creation) modes.
tests/e2e/hcp_external_cluster_backup_restore_suite_test.go (2)
83-118: LGTM: Clean refactoring of verification helpers.The extraction of createTestNamespace() and validateTestNamespace() improves code organization and reusability. The use of errors.Join() for aggregating verification errors is appropriate, and createTestNamespace() is properly idempotent by ignoring AlreadyExists errors.
45-60: LGTM: Proper ROSA mode integration.The HCHandler now receives ClientServiceCluster for Service Cluster operations, and the AfterEach correctly selects the deployment operation based on whether the mode is ExternalROSA, ensuring proper cleanup for each scenario.
tests/e2e/hcp_backup_restore_suite_test.go (4)
20-24: LGTM: Clear ROSA mode constant addition.The HCModeExternalROSA constant is well-documented and properly added to the existing mode enumeration with consistent formatting.
44-61: LGTM: Proper OADP deployment orchestration for ROSA.The code correctly:
- Selects ROSA deployment operation when mode is HCModeExternalROSA
- Deletes backup repositories to ensure clean state
- Skips HCP plugin addition for ROSA (DPA is managed via ManifestWork in Service Cluster)
- Retains HCP plugin addition and readiness wait for non-ROSA modes
110-121: LGTM: Conditional ManifestWork operations for ROSA.The switch statement properly isolates ROSA-specific logic:
- HCModeExternalROSA: backs up and deletes ManifestWork (DPA managed in Service Cluster)
- Other modes: use standard RemoveHCP() for cleanup
This ensures ManifestWork operations only run in the appropriate context.
86-98: LGTM: Unified guest verification for external modes.Both HCModeExternal and HCModeExternalROSA now support guest cluster verification with proper kubeconfig retrieval and client validation. The post-restore verification uses updated timeout constants (ValidateHCPTimeout, WaitForNextCheckTimeout) for more flexible timing control.
Also applies to: 134-146
- Add external-rosa mode for HC_BACKUP_RESTORE_MODE to support existing ROSA clusters - Introduce HC_NAMESPACE parameter for configurable cluster namespace management - Add service cluster kubeconfig support via SC_KUBECONFIG parameter for ROSA ManifestWork operations - Implement ManifestWork backup/deletion functionality for ROSA cluster lifecycle management - Add open-cluster-management.io/api dependency to support ManifestWork operations - Create separate OADP deployment operations for default vs ROSA scenarios - Skip DPA HCP plugin modification for ROSA where DPA is managed via ManifestWork - Add VSL_AWS_PROFILE parameter for volume snapshot location AWS profile configuration - Refactor backup/restore suite to use pluggable deployment strategies - Update test configuration to handle both regular HCP and ROSA cluster workflows 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
e31b5f7 to
c9e9437
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (13)
Makefile(4 hunks)docs/developer/testing/TESTING.md(1 hunks)go.mod(1 hunks)tests/e2e/backup_restore_cli_suite_test.go(1 hunks)tests/e2e/backup_restore_suite_test.go(4 hunks)tests/e2e/e2e_suite_test.go(7 hunks)tests/e2e/hcp_backup_restore_suite_test.go(6 hunks)tests/e2e/hcp_external_cluster_backup_restore_suite_test.go(4 hunks)tests/e2e/lib/dpa_helpers.go(2 hunks)tests/e2e/lib/hcp/hcp.go(11 hunks)tests/e2e/lib/hcp/types.go(1 hunks)tests/e2e/scripts/aws_settings.sh(1 hunks)tests/e2e/virt_backup_restore_suite_test.go(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/e2e/scripts/aws_settings.sh
- go.mod
🧰 Additional context used
📓 Path-based instructions (1)
**
⚙️ CodeRabbit configuration file
-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.
Files:
docs/developer/testing/TESTING.mdtests/e2e/virt_backup_restore_suite_test.gotests/e2e/lib/dpa_helpers.gotests/e2e/backup_restore_cli_suite_test.gotests/e2e/lib/hcp/types.gotests/e2e/lib/hcp/hcp.gotests/e2e/e2e_suite_test.gotests/e2e/hcp_external_cluster_backup_restore_suite_test.goMakefiletests/e2e/backup_restore_suite_test.gotests/e2e/hcp_backup_restore_suite_test.go
🔇 Additional comments (5)
tests/e2e/lib/dpa_helpers.go (5)
79-79: LGTM! Clean refactoring.Delegating BSL spec construction to
BackupStorageLocationSpec()improves code organization and reusability.
113-133: LGTM! Well-structured helper method.The
BackupStorageLocationSpec()method properly constructs a complete BSL spec with all necessary fields, improving code maintainability.
135-151: LGTM! Idempotent creation helper.The method correctly creates a BSL resource and handles
AlreadyExistserrors gracefully, making it safe for repeated test execution.
153-166: LGTM! Idempotent deletion helper.The method correctly deletes a BSL resource and handles
NotFounderrors gracefully, ensuring safe cleanup in tests.
190-203: LGTM! Idempotent deletion helper.The method correctly deletes a VSL resource and handles
NotFounderrors gracefully, ensuring safe cleanup in tests.
| func tearDownBackupAndRestore(brCase BackupRestoreCase, installTime time.Time, report ginkgo.SpecReport) { | ||
| log.Println("Post backup and restore state: ", report.State.String()) | ||
| gatherLogs(brCase, installTime, report) | ||
| tearDownDPAResources(brCase) | ||
| NewOADPDeploymentOperationDefault().Undeploy(brCase.BackupRestoreType) | ||
| deleteNamespace(brCase.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure ROSA teardown uses matching deployment operation
tearDownBackupAndRestore always instantiates the default operation, so when the suite runs in external-rosa mode we end up calling Undeploy with CreateDPA=true, which deletes the DPA and any CSI resources that are supposed to be preserved by the ManifestWork. That leaves the ROSA environment in a bad state for the next test run. Please select the ROSA variant when the mode is external-rosa, e.g.:
func tearDownBackupAndRestore(brCase BackupRestoreCase, installTime time.Time, report ginkgo.SpecReport) {
log.Println("Post backup and restore state: ", report.State.String())
gatherLogs(brCase, installTime, report)
- NewOADPDeploymentOperationDefault().Undeploy(brCase.BackupRestoreType)
+ op := NewOADPDeploymentOperationDefault()
+ if HCBackupRestoreMode(hcBackupRestoreMode) == HCModeExternalROSA {
+ op = NewOADPDeploymentOperationROSA()
+ }
+ op.Undeploy(brCase.BackupRestoreType)
deleteNamespace(brCase.Namespace)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func tearDownBackupAndRestore(brCase BackupRestoreCase, installTime time.Time, report ginkgo.SpecReport) { | |
| log.Println("Post backup and restore state: ", report.State.String()) | |
| gatherLogs(brCase, installTime, report) | |
| tearDownDPAResources(brCase) | |
| NewOADPDeploymentOperationDefault().Undeploy(brCase.BackupRestoreType) | |
| deleteNamespace(brCase.Namespace) | |
| func tearDownBackupAndRestore(brCase BackupRestoreCase, installTime time.Time, report ginkgo.SpecReport) { | |
| log.Println("Post backup and restore state: ", report.State.String()) | |
| gatherLogs(brCase, installTime, report) | |
| op := NewOADPDeploymentOperationDefault() | |
| if HCBackupRestoreMode(hcBackupRestoreMode) == HCModeExternalROSA { | |
| op = NewOADPDeploymentOperationROSA() | |
| } | |
| op.Undeploy(brCase.BackupRestoreType) | |
| deleteNamespace(brCase.Namespace) | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. The function tearDownBackupAndRestore is only called in the usual/classic cases when HCP is not external. The test cases which work with external HCP choose the operation explicitly there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgencur Thank you for the clarification! You're correct. Looking at the code structure, tearDownBackupAndRestore is only called in the AfterEach hook of the "Backup and restore tests" describe block, which contains regular application backup/restore test cases (MySQL CSI, Mongo KOPIA, etc.) that don't involve HCP or ROSA modes.
The external HCP tests that use ROSA mode are in separate test files (hcp_external_cluster_backup_restore_suite_test.go) and have their own AfterEach hooks that explicitly select the appropriate deployment operation based on the mode. So the concern about ROSA teardown doesn't apply to this function.
The original review comment was based on an incorrect assumption about the scope of this function. Your implementation is correct as-is.
* Also guard against nil errors
|
/retest |
|
/retest all |
|
@weshayutin: The Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test all |
|
/retest after #2017 merge |
|
@kaovilai: The Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/retest |
1 similar comment
|
/retest |
|
@mgencur: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PRs builds on top of #1921 and adds one more commit with the changes described below. Please review that PR first and then the last commit from the current PR. When pull/1921 is merged the current PR needs to be rebased.
Why the changes were made
How to test the changes made