Skip to content

Conversation

@kaovilai
Copy link
Member

@kaovilai kaovilai commented Nov 4, 2025

Summary

Remove local must-gather directory and build process in favor of using the external quay.io/konveyor/oadp-must-gather:latest image via oc adm must-gather. This eliminates architecture mismatch issues and keeps must-gather code in its dedicated repository at https://github.com/openshift/oadp-must-gather.

Changes

  • Updated RunMustGather() in tests/e2e/lib/apps.go to use oc adm must-gather command
  • Added MUST_GATHER_IMAGE environment variable (defaults to quay.io/konveyor/oadp-must-gather:latest)
  • Removed build-must-gather target from Makefile
  • Removed entire must-gather/ directory (3,174 lines deleted)
  • Updated documentation in docs/developer/testing/TESTING.md

Benefits

  • ✅ No more architecture mismatch issues between local build and cluster
  • ✅ Must-gather code maintained in dedicated repository
  • ✅ Simpler build process (no local compilation needed)
  • ✅ Easy to override image for version-specific testing via MUST_GATHER_IMAGE env var
  • SKIP_MUST_GATHER flag preserved for skipping collection

Usage

Default behavior (uses latest):

make test-e2e

With custom image:

MUST_GATHER_IMAGE=quay.io/konveyor/oadp-must-gather:oadp-1.4 make test-e2e

Skip must-gather:

SKIP_MUST_GATHER=true make test-e2e

Fixes #2005

Testing

  • Code compiles successfully
  • No remaining references to local must-gather binary
  • Documentation updated

Note

Responses generated with Claude

@coderabbitai
Copy link

coderabbitai bot commented Nov 4, 2025

Walkthrough

Removes the local must-gather implementation and build artifacts, deletes must-gather docs/scripts, and updates E2E tests and Makefile to run must-gather via a configurable container image (MUST_GATHER_IMAGE) with glob-based summary discovery.

Changes

Cohort / File(s) Summary
Build & Makefile
Makefile
Removed build-must-gather target and its dependency from test-e2e-setup; removed sed insertion into config/samples/kustomization.yaml.
Testing docs
docs/developer/testing/TESTING.md
Added MUST_GATHER_IMAGE variable and simplified SKIP_MUST_GATHER description to indicate skipping must-gather collection in E2E tests.
E2E tests
tests/e2e/lib/apps.go
Replaced local must-gather invocation with oc adm must-gather --image=${MUST_GATHER_IMAGE}, captures output/errors, uses cluster version to derive clusterID, and performs glob-based lookup for the generated summary file.
Must-gather binary & CLI
must-gather/cmd/main.go, must-gather/pkg/cli.go
Deleted the CLI entrypoint and full must-gather command implementation (flags, cluster interactions, collection orchestration).
Must-gather internals
must-gather/pkg/gather/gather.go, must-gather/pkg/gvk/gvk.go, must-gather/pkg/templates/summary.go
Removed gather helper function, package-level GVK declarations, and the summary templating/writing logic and exported functions.
Must-gather build/module
must-gather/Dockerfile, must-gather/go.mod
Removed the multi-stage Dockerfile and the Go module file for the must-gather submodule.
Must-gather docs & deprecated scripts
must-gather/README.md, must-gather/deprecated/*
Deleted must-gather/README.md and all deprecated wrapper scripts under must-gather/deprecated/ (gather_, gather__essential, etc.).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus review on tests/e2e/lib/apps.go for proper defaulting and robust parsing of MUST_GATHER_IMAGE, correct invocation and error handling of oc adm must-gather, and reliability of the glob-based summary discovery.
  • Verify Makefile removals don't break other targets that expected the must-gather build.
  • Search repository for references to removed public symbols (GVKs, CLI, templates, must-gather module) to ensure no remaining callers.

Assessment against linked issues

Objective Addressed Explanation
Use must-gather from new repo / run as image or submodule (#2005)
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Nov 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 4, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/e2e/lib/apps.go (1)

399-410: Add bounds checking before slicing ClusterID.

Line 409 will panic if clusterVersion.Spec.ClusterID is shorter than 8 characters. While cluster IDs are typically UUIDs (36 characters), defensive programming requires bounds checking.

Apply this diff to add bounds checking:

 	clusterVersion := &clusterVersionList.Items[0]
-	clusterID := string(clusterVersion.Spec.ClusterID[:8])
+	clusterIDStr := string(clusterVersion.Spec.ClusterID)
+	if len(clusterIDStr) < 8 {
+		return fmt.Errorf("cluster ID is too short: %s", clusterIDStr)
+	}
+	clusterID := clusterIDStr[:8]
🧹 Nitpick comments (1)
tests/e2e/lib/apps.go (1)

422-432: Consider logging which summary file is being read.

When multiple matches exist (e.g., from previous test runs), line 423 silently uses the first match. Adding a log statement would help with debugging.

 	// Read and validate the summary
+	log.Printf("Reading must-gather summary from: %s", matches[0])
 	mustGatherSummaryContent, err := os.ReadFile(matches[0])
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 513180a and f352b31.

⛔ Files ignored due to path filters (1)
  • must-gather/go.sum is excluded by !**/*.sum
📒 Files selected for processing (24)
  • Makefile (1 hunks)
  • docs/developer/testing/TESTING.md (1 hunks)
  • must-gather/Dockerfile (0 hunks)
  • must-gather/README.md (0 hunks)
  • must-gather/cmd/main.go (0 hunks)
  • must-gather/deprecated/gather_1h (0 hunks)
  • must-gather/deprecated/gather_1h_essential (0 hunks)
  • must-gather/deprecated/gather_24h (0 hunks)
  • must-gather/deprecated/gather_24h_essential (0 hunks)
  • must-gather/deprecated/gather_6h (0 hunks)
  • must-gather/deprecated/gather_6h_essential (0 hunks)
  • must-gather/deprecated/gather_72h (0 hunks)
  • must-gather/deprecated/gather_72h_essential (0 hunks)
  • must-gather/deprecated/gather_all (0 hunks)
  • must-gather/deprecated/gather_all_essential (0 hunks)
  • must-gather/deprecated/gather_metrics_dump (0 hunks)
  • must-gather/deprecated/gather_with_timeout (0 hunks)
  • must-gather/deprecated/gather_without_tls (0 hunks)
  • must-gather/go.mod (0 hunks)
  • must-gather/pkg/cli.go (0 hunks)
  • must-gather/pkg/gather/gather.go (0 hunks)
  • must-gather/pkg/gvk/gvk.go (0 hunks)
  • must-gather/pkg/templates/summary.go (0 hunks)
  • tests/e2e/lib/apps.go (2 hunks)
💤 Files with no reviewable changes (21)
  • must-gather/deprecated/gather_metrics_dump
  • must-gather/deprecated/gather_1h
  • must-gather/deprecated/gather_72h
  • must-gather/go.mod
  • must-gather/deprecated/gather_72h_essential
  • must-gather/deprecated/gather_24h_essential
  • must-gather/deprecated/gather_all_essential
  • must-gather/deprecated/gather_with_timeout
  • must-gather/pkg/gather/gather.go
  • must-gather/cmd/main.go
  • must-gather/README.md
  • must-gather/deprecated/gather_without_tls
  • must-gather/deprecated/gather_6h
  • must-gather/deprecated/gather_1h_essential
  • must-gather/pkg/cli.go
  • must-gather/deprecated/gather_all
  • must-gather/deprecated/gather_6h_essential
  • must-gather/Dockerfile
  • must-gather/pkg/gvk/gvk.go
  • must-gather/pkg/templates/summary.go
  • must-gather/deprecated/gather_24h
🔇 Additional comments (4)
Makefile (1)

792-792: LGTM! Dependency simplified correctly.

The removal of the build-must-gather dependency aligns with the PR objective to use an external container image instead of building locally.

docs/developer/testing/TESTING.md (1)

28-29: LGTM! Documentation clearly describes the new environment variables.

The documentation accurately reflects the new must-gather behavior with environment-driven image selection.

tests/e2e/lib/apps.go (2)

381-387: LGTM! Environment variable handling is correct.

The implementation properly reads the MUST_GATHER_IMAGE environment variable with a sensible default value, and includes helpful comments about version-specific usage.


389-397: LGTM! Command execution and error handling look good.

The oc adm must-gather command is constructed correctly with appropriate flags, and errors are captured with output for debugging.

Comment on lines 411 to 425
// Find the must-gather output directory
// oc adm must-gather creates: <artifact_dir>/quay-io-konveyor-oadp-must-gather-*/clusters/<cluster-id>/
pattern := filepath.Join(artifact_dir, "quay-io-konveyor-oadp-must-gather-*", "clusters", clusterID, "oadp-must-gather-summary.md")
matches, err := filepath.Glob(pattern)
if err != nil {
return fmt.Errorf("error finding must-gather summary: %w", err)
}
if len(matches) == 0 {
return fmt.Errorf("no must-gather summary found at pattern: %s", pattern)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Hardcoded directory pattern breaks custom image support.

The glob pattern on line 413 assumes the directory name will always be quay-io-konveyor-oadp-must-gather-*, but this won't work when users set MUST_GATHER_IMAGE to a different registry or repository (e.g., myregistry.com/my-must-gather:latest). The directory name created by oc adm must-gather is derived from the image URL, so the pattern must be dynamic.

Consider deriving the pattern from the actual image name or using a more flexible glob:

-	// Find the must-gather output directory
-	// oc adm must-gather creates: <artifact_dir>/quay-io-konveyor-oadp-must-gather-*/clusters/<cluster-id>/
-	pattern := filepath.Join(artifact_dir, "quay-io-konveyor-oadp-must-gather-*", "clusters", clusterID, "oadp-must-gather-summary.md")
+	// Find the must-gather output directory
+	// oc adm must-gather creates: <artifact_dir>/<image-based-dir>/clusters/<cluster-id>/
+	// Use a wildcard pattern to match any must-gather output directory
+	pattern := filepath.Join(artifact_dir, "*", "clusters", clusterID, "oadp-must-gather-summary.md")

Additionally, consider logging what was found when the pattern fails to match:

 	matches, err := filepath.Glob(pattern)
 	if err != nil {
 		return fmt.Errorf("error finding must-gather summary: %w", err)
 	}
 	if len(matches) == 0 {
-		return fmt.Errorf("no must-gather summary found at pattern: %s", pattern)
+		// List what directories exist to help debug
+		dirs, _ := filepath.Glob(filepath.Join(artifact_dir, "*"))
+		return fmt.Errorf("no must-gather summary found at pattern: %s\nDirectories in artifact_dir: %v", pattern, dirs)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Find the must-gather output directory
// oc adm must-gather creates: <artifact_dir>/quay-io-konveyor-oadp-must-gather-*/clusters/<cluster-id>/
pattern := filepath.Join(artifact_dir, "quay-io-konveyor-oadp-must-gather-*", "clusters", clusterID, "oadp-must-gather-summary.md")
matches, err := filepath.Glob(pattern)
if err != nil {
return fmt.Errorf("error finding must-gather summary: %w", err)
}
if len(matches) == 0 {
return fmt.Errorf("no must-gather summary found at pattern: %s", pattern)
}
// Find the must-gather output directory
// oc adm must-gather creates: <artifact_dir>/<image-based-dir>/clusters/<cluster-id>/
// Use a wildcard pattern to match any must-gather output directory
pattern := filepath.Join(artifact_dir, "*", "clusters", clusterID, "oadp-must-gather-summary.md")
matches, err := filepath.Glob(pattern)
if err != nil {
return fmt.Errorf("error finding must-gather summary: %w", err)
}
if len(matches) == 0 {
// List what directories exist to help debug
dirs, _ := filepath.Glob(filepath.Join(artifact_dir, "*"))
return fmt.Errorf("no must-gather summary found at pattern: %s\nDirectories in artifact_dir: %v", pattern, dirs)
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/e2e/lib/apps.go (1)

411-425: Consider using a wildcard pattern for better robustness.

The current pattern derivation logic (lines 415-418) attempts to mirror how oc adm must-gather names directories, but this could fail if the actual naming logic differs or if edge cases exist (e.g., registries with multiple dots, special characters in tags). The past review suggested using a simpler wildcard pattern, which would be more resilient.

Apply this diff to use a wildcard pattern and add debugging information:

 	// Find the must-gather output directory
-	// oc adm must-gather creates a directory based on the image name with registry separators
-	// replaced by hyphens. E.g., quay.io/konveyor/oadp-must-gather:latest -> quay-io-konveyor-oadp-must-gather-*
-	// We need to derive the pattern from the actual image being used
-	imagePattern := strings.ReplaceAll(mustGatherImage, ":", "-")
-	imagePattern = strings.ReplaceAll(imagePattern, "/", "-")
-	imagePattern = strings.ReplaceAll(imagePattern, ".", "-")
-	pattern := filepath.Join(artifact_dir, imagePattern+"-*", "clusters", clusterID, "oadp-must-gather-summary.md")
+	// oc adm must-gather creates: <artifact_dir>/<image-based-dir>/clusters/<cluster-id>/
+	// Use a wildcard pattern to match any must-gather output directory
+	pattern := filepath.Join(artifact_dir, "*", "clusters", clusterID, "oadp-must-gather-summary.md")
 	matches, err := filepath.Glob(pattern)
 	if err != nil {
 		return fmt.Errorf("error finding must-gather summary: %w", err)
 	}
 	if len(matches) == 0 {
-		return fmt.Errorf("no must-gather summary found at pattern: %s", pattern)
+		// List what directories exist to help debug
+		dirs, _ := filepath.Glob(filepath.Join(artifact_dir, "*"))
+		return fmt.Errorf("no must-gather summary found at pattern: %s\nDirectories in artifact_dir: %v", pattern, dirs)
 	}

This approach is simpler, more robust, and provides better debugging information when the pattern fails to match.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between f352b31 and ac8bcc6.

📒 Files selected for processing (1)
  • tests/e2e/lib/apps.go (2 hunks)
🔇 Additional comments (3)
tests/e2e/lib/apps.go (3)

381-387: LGTM! Clean environment variable handling.

The environment variable handling is well-implemented with a sensible default and helpful inline documentation for users who need version-specific images.


389-397: LGTM! Proper command execution and error handling.

The command execution correctly uses CombinedOutput and wraps errors with the command output for better debugging.


427-441: LGTM! Validation logic is sound.

The summary file reading and validation correctly checks for the expected success message. The TODO comment appropriately flags future enhancement opportunities.

}
clusterVersion := &clusterVersionList.Items[0]
mustGatherPath := fmt.Sprintf("%s/must-gather/clusters/%s/", artifact_dir, string(clusterVersion.Spec.ClusterID[:8]))
clusterID := string(clusterVersion.Spec.ClusterID[:8])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Potential panic: validate ClusterID length before slicing.

The slice operation [:8] assumes the ClusterID is at least 8 characters long. If it's shorter, this will panic with a runtime error.

Apply this diff to safely handle shorter ClusterIDs:

-	clusterID := string(clusterVersion.Spec.ClusterID[:8])
+	clusterID := string(clusterVersion.Spec.ClusterID)
+	if len(clusterID) > 8 {
+		clusterID = clusterID[:8]
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
clusterID := string(clusterVersion.Spec.ClusterID[:8])
clusterID := string(clusterVersion.Spec.ClusterID)
if len(clusterID) > 8 {
clusterID = clusterID[:8]
}
🤖 Prompt for AI Agents
In tests/e2e/lib/apps.go around line 409, the code unconditionally slices
clusterVersion.Spec.ClusterID[:8] which can panic if ClusterID is shorter than 8
characters; validate the length first and only slice up to min(8,
len(ClusterID)). If ClusterID is empty or nil handle that case (e.g., use the
whole string, a safe default, or return an error) and ensure you do not index
past the string length before converting to string.

@kaovilai
Copy link
Member Author

kaovilai commented Nov 4, 2025

/retest

ai-retester: The OADP E2E test failed because the "Mongo application CSI" test timed out while waiting for the todolist pod to succeed. Consequently, the must-gather process also failed due to missing summary files.

The e2e-test-cli-aws-e2e step failed because the Mongo application CSI via CLI test timed out waiting for the todolist-6d7bb9554c-wdtkj pod to become ready, and the must-gather process failed and therefore no summary was generated. Specifically, the pod was stuck in PodInitializingstate. This suggests a problem with the application deployment or configuration in the test environment.

The Mongo application CSI test timed out after 540 seconds, and further attempts to run it also failed due to the todolist pod failing to initialize. The Must-gather stage also failed because it couldn't find a summary.

@kaovilai
Copy link
Member Author

kaovilai commented Nov 4, 2025

ai-retester: The OADP E2E test failed because the "Mongo application CSI" test timed out while waiting for the todolist pod to succeed. Consequently, the must-gather process also failed due to missing summary files.

The e2e-test-cli-aws-e2e step failed because the Mongo application CSI via CLI test timed out waiting for the todolist-6d7bb9554c-wdtkj pod to become ready, and the must-gather process failed and therefore no summary was generated. Specifically, the pod was stuck in PodInitializingstate. This suggests a problem with the application deployment or configuration in the test environment.

The Mongo application CSI test timed out after 540 seconds, and further attempts to run it also failed due to the todolist pod failing to initialize. The Must-gather stage also failed because it couldn't find a summary.

Good, https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_oadp-operator/2007/pull-ci-openshift-oadp-operator-oadp-dev-4.19-e2e-test-aws/1985812568559587328/artifacts/e2e-test-aws/e2e/artifacts/Mongo%20application%20CSI/ still populated, which means this PR works.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 513180a and 2 for PR HEAD ac8bcc6 in total

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 5048a8f and 1 for PR HEAD ac8bcc6 in total

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
kaovilai and others added 2 commits November 5, 2025 14:31
Remove local must-gather directory and build process in favor of using the
external quay.io/konveyor/oadp-must-gather:latest image via oc adm must-gather.
This eliminates architecture mismatch issues and keeps must-gather code in its
dedicated repository.

Changes:
- Updated RunMustGather() in tests/e2e/lib/apps.go to use oc adm must-gather
- Added MUST_GATHER_IMAGE env var (defaults to quay.io/konveyor/oadp-must-gather:latest)
- Removed build-must-gather target from Makefile
- Removed entire must-gather/ directory (3,174 lines deleted)
- Updated documentation in TESTING.md

The SKIP_MUST_GATHER flag is preserved for skipping must-gather collection.
Version-specific images can be used by setting MUST_GATHER_IMAGE env var.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
The directory pattern was hardcoded to 'quay-io-konveyor-oadp-must-gather-*'
which breaks when using custom images via MUST_GATHER_IMAGE env var.

Now dynamically derives the pattern from the actual image name by replacing
registry separators (. / :) with hyphens to match oc adm must-gather's
directory naming convention.

Examples:
- quay.io/konveyor/oadp-must-gather:latest -> quay-io-konveyor-oadp-must-gather-latest-*
- docker.io/myuser/custom:v1 -> docker-io-myuser-custom-v1-*

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 5, 2025
@openshift-ci
Copy link

openshift-ci bot commented Nov 5, 2025

New changes are detected. LGTM label has been removed.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
@kaovilai
Copy link
Member Author

kaovilai commented Nov 5, 2025

updated after rebase

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tests/e2e/lib/apps.go (1)

409-409: Potential panic: validate ClusterID length before slicing.

This is the same critical issue flagged in the previous review. The slice operation [:8] will panic at runtime if ClusterID is shorter than 8 characters.

Apply this diff to safely handle shorter ClusterIDs:

-	clusterID := string(clusterVersion.Spec.ClusterID[:8])
+	clusterID := string(clusterVersion.Spec.ClusterID)
+	if len(clusterID) > 8 {
+		clusterID = clusterID[:8]
+	}
🧹 Nitpick comments (1)
tests/e2e/lib/apps.go (1)

423-425: Enhance error message with available directories for debugging.

When the pattern fails to match, listing the actual directories in artifact_dir would help diagnose whether the issue is pattern mismatch or missing must-gather output.

Apply this diff:

 	if len(matches) == 0 {
-		return fmt.Errorf("no must-gather summary found at pattern: %s", pattern)
+		// List what directories exist to help debug
+		dirs, _ := filepath.Glob(filepath.Join(artifact_dir, "*"))
+		return fmt.Errorf("no must-gather summary found at pattern: %s\nDirectories in artifact_dir: %v", pattern, dirs)
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between ac8bcc6 and 2ae6d45.

⛔ Files ignored due to path filters (1)
  • must-gather/go.sum is excluded by !**/*.sum
📒 Files selected for processing (24)
  • Makefile (1 hunks)
  • docs/developer/testing/TESTING.md (1 hunks)
  • must-gather/Dockerfile (0 hunks)
  • must-gather/README.md (0 hunks)
  • must-gather/cmd/main.go (0 hunks)
  • must-gather/deprecated/gather_1h (0 hunks)
  • must-gather/deprecated/gather_1h_essential (0 hunks)
  • must-gather/deprecated/gather_24h (0 hunks)
  • must-gather/deprecated/gather_24h_essential (0 hunks)
  • must-gather/deprecated/gather_6h (0 hunks)
  • must-gather/deprecated/gather_6h_essential (0 hunks)
  • must-gather/deprecated/gather_72h (0 hunks)
  • must-gather/deprecated/gather_72h_essential (0 hunks)
  • must-gather/deprecated/gather_all (0 hunks)
  • must-gather/deprecated/gather_all_essential (0 hunks)
  • must-gather/deprecated/gather_metrics_dump (0 hunks)
  • must-gather/deprecated/gather_with_timeout (0 hunks)
  • must-gather/deprecated/gather_without_tls (0 hunks)
  • must-gather/go.mod (0 hunks)
  • must-gather/pkg/cli.go (0 hunks)
  • must-gather/pkg/gather/gather.go (0 hunks)
  • must-gather/pkg/gvk/gvk.go (0 hunks)
  • must-gather/pkg/templates/summary.go (0 hunks)
  • tests/e2e/lib/apps.go (2 hunks)
💤 Files with no reviewable changes (21)
  • must-gather/deprecated/gather_all_essential
  • must-gather/deprecated/gather_metrics_dump
  • must-gather/Dockerfile
  • must-gather/deprecated/gather_72h
  • must-gather/deprecated/gather_24h_essential
  • must-gather/deprecated/gather_1h
  • must-gather/deprecated/gather_6h
  • must-gather/deprecated/gather_1h_essential
  • must-gather/go.mod
  • must-gather/deprecated/gather_all
  • must-gather/pkg/cli.go
  • must-gather/deprecated/gather_24h
  • must-gather/deprecated/gather_without_tls
  • must-gather/deprecated/gather_with_timeout
  • must-gather/cmd/main.go
  • must-gather/pkg/gather/gather.go
  • must-gather/README.md
  • must-gather/pkg/gvk/gvk.go
  • must-gather/deprecated/gather_72h_essential
  • must-gather/deprecated/gather_6h_essential
  • must-gather/pkg/templates/summary.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/developer/testing/TESTING.md
  • Makefile
🔇 Additional comments (1)
tests/e2e/lib/apps.go (1)

381-397: LGTM: Environment-driven must-gather execution.

The environment variable setup and command execution are well-implemented:

  • Sensible default image with clear upgrade path via MUST_GATHER_IMAGE
  • Proper use of oc adm must-gather with explicit flags
  • Comprehensive error handling with command output included

Comment on lines +415 to +418
imagePattern := strings.ReplaceAll(mustGatherImage, ":", "-")
imagePattern = strings.ReplaceAll(imagePattern, "/", "-")
imagePattern = strings.ReplaceAll(imagePattern, ".", "-")
pattern := filepath.Join(artifact_dir, imagePattern+"-*", "clusters", clusterID, "oadp-must-gather-summary.md")
Copy link

@coderabbitai coderabbitai bot Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Simplify pattern to avoid brittle image-name transformation.

The current approach attempts to replicate oc adm must-gather's directory-naming logic by replacing characters in the image name. This introduces several risks:

  • The transformation may not match oc's actual behavior (e.g., oc might handle edge cases like consecutive dots or special characters differently)
  • The order of replacements could affect the result (e.g., replacing . before : might interact unexpectedly)
  • Future changes to oc adm must-gather could break this logic

Consider using a simpler wildcard pattern that doesn't depend on replicating oc's transformation:

-	// Find the must-gather output directory
-	// oc adm must-gather creates a directory based on the image name with registry separators
-	// replaced by hyphens. E.g., quay.io/konveyor/oadp-must-gather:latest -> quay-io-konveyor-oadp-must-gather-*
-	// We need to derive the pattern from the actual image being used
-	imagePattern := strings.ReplaceAll(mustGatherImage, ":", "-")
-	imagePattern = strings.ReplaceAll(imagePattern, "/", "-")
-	imagePattern = strings.ReplaceAll(imagePattern, ".", "-")
-	pattern := filepath.Join(artifact_dir, imagePattern+"-*", "clusters", clusterID, "oadp-must-gather-summary.md")
+	// Find the must-gather output directory
+	// oc adm must-gather creates: <artifact_dir>/<image-based-dir>/clusters/<cluster-id>/
+	// Use a wildcard to match any must-gather output directory
+	pattern := filepath.Join(artifact_dir, "*", "clusters", clusterID, "oadp-must-gather-summary.md")

This approach is more robust and doesn't require maintaining knowledge of oc's internal naming conventions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
imagePattern := strings.ReplaceAll(mustGatherImage, ":", "-")
imagePattern = strings.ReplaceAll(imagePattern, "/", "-")
imagePattern = strings.ReplaceAll(imagePattern, ".", "-")
pattern := filepath.Join(artifact_dir, imagePattern+"-*", "clusters", clusterID, "oadp-must-gather-summary.md")
// Find the must-gather output directory
// oc adm must-gather creates: <artifact_dir>/<image-based-dir>/clusters/<cluster-id>/
// Use a wildcard to match any must-gather output directory
pattern := filepath.Join(artifact_dir, "*", "clusters", clusterID, "oadp-must-gather-summary.md")
🤖 Prompt for AI Agents
In tests/e2e/lib/apps.go around lines 415 to 418, remove the brittle manual
image-name character replacements and instead use a simple wildcard segment when
building the path to the must-gather summary; replace the transformed
imagePattern/path construction with a pattern that uses a wildcard (e.g., a
single "*" or a "*must-gather*" segment) under artifact_dir so we match the
must-gather directory regardless of how oc transformed the image name, then join
that wildcard segment with "clusters", clusterID and
"oadp-must-gather-summary.md".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up for debate

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

# Check how artifact_dir is used in tests and if multiple must-gather runs could occur
echo "=== Checking artifact_dir usage patterns in e2e tests ==="
rg -n "artifact_dir" --type=go -C2

echo -e "\n=== Checking for multiple RunMustGather calls or patterns ==="
rg -n "RunMustGather" --type=go -C3

echo -e "\n=== Checking if tests clean up artifact directories between runs ==="
rg -n "RemoveAll|MkdirAll.*artifact" --type=go -C2

echo -e "\n=== Looking for any existing glob patterns or directory discovery logic ==="
rg -n "filepath\.Glob|filepath\.Join.*\*" --type=go tests/e2e/

Length of output: 9417


@kaovilai Thanks for being open to discussion! I analyzed the test codebase to understand the context better.

Key findings:

  • artifact_dir is shared across all test suites and defaults to /tmp
  • Multiple RunMustGather() calls can accumulate in the same directory (success cases, failure cases, different test suites)
  • oc adm must-gather creates timestamped directories like quay-io-konveyor-oadp-must-gather-20250105123456
  • Your current code uses matches[0] (first glob match)

Trade-off analysis:

Current approach (image-based transformations):

  • ✅ More specific to the actual image used
  • ❌ Brittle: order-dependent replacements ("." before ":" matters)
  • ❌ Assumes knowledge of oc's internal logic
  • ❌ Could break with future oc changes

Simple wildcard (*):

  • ✅ Robust and maintainable
  • ❌ May match unrelated directories in artifact_dir
  • ❌ With multiple must-gather runs, matches[0] gives first alphabetically, not newest

Recommendation - middle ground:
Use a more specific wildcard pattern with explicit sorting:

// Find the most recent must-gather output directory
// oc adm must-gather creates timestamped directories under artifact_dir
pattern := filepath.Join(artifact_dir, "*oadp-must-gather*", "clusters", clusterID, "oadp-must-gather-summary.md")
matches, err := filepath.Glob(pattern)
if err != nil {
	return fmt.Errorf("error finding must-gather summary: %w", err)
}
if len(matches) == 0 {
	return fmt.Errorf("no must-gather summary found at pattern: %s", pattern)
}
// Sort to get the most recent (timestamps sort lexicographically)
sort.Strings(matches)
summaryPath := matches[len(matches)-1]

mustGatherSummaryContent, err := os.ReadFile(summaryPath)

This approach:

  • Avoids replicating oc's transformation logic
  • Still filters to OADP-specific directories
  • Explicitly picks the most recent run when multiple exist
  • Documents the behavior clearly

Alternative: If you want to keep the transformation approach for specificity, at least add a comment documenting the risk and consider validating the pattern works as expected (maybe log the matched path).

What do you think?


🧬 Code Graph Analysis Results

<no_relevant_code_snippets>

@kaovilai
Copy link
Member Author

kaovilai commented Nov 5, 2025

/retest

ai-retester: The end-to-end (e2e) test e2e-test-cli-aws failed because the Mongo application CSI via CLI test timed out while waiting for the todolist pod to reach the succeeded state. Additionally, the must-gather utility failed to find a summary file. Container test exited with code 2, reason Error.

Why the Mongo test blew up

The entire CI run was green until we hit the Mongo application CSI test.
That test finished the initial phases (DPA & Velero were up, the BSL was
available, and the Mongo/ todolist pods were created) and then we hit the
pausing points:

…waiting for pod to be ready…
  Pod not yet succeeded
…
container "todolist" in pod "todolist-…" is waiting to start: PodInitializing

The test had just finished installing the Mongo database and the
initial‑service container “init‑myservice”. It was the todolist container
that never progressed from PodInitializing to Running. There were no
events showing the image being pulled or started – the only event was the
init container that simply sleeps forever. The pod never found the
quay.io/migtools/oadp-ci-todolist-mongo-go:latest image, so the container
was stuck in PodInitializing and the Eventually guard timed out after
540 s.

In short, the todolist container could not start (most likely the image
was missing/incorrect), so the backup/restore cycle for Mongo never reached
the restore‑verify step, producing the single test failure.

Short failure summary

  1. Mongo CSI backup‑restore test
    The todolist pod never reached a running state.
    The test created a MongoDB pod and a T–odo‑list pod that depended on it.
    The T–odo‑list container was stuck in the PodInitializing phase for over 20 minutes, eventually timing out after 540 s.
    Parallel‑Pod event output shows the pod was repeatedly re‑started and never received the “Running” status.
    Because the test uses the Eventually assertion, the container stayed in PodInitializing and the test failed, causing the whole suite to abort.

  2. Must‑Gather for the test
    The final post‑test must‑gather step could not find the expected oadp‑must‑gather‑summary.md file, so it returned an error.

So, the single failure occurs when the Mongo T‑odo‑list pod failed to start (likely a race or resource issue), and the must‑gather step also failed to locate its summary file.

@kaovilai
Copy link
Member Author

kaovilai commented Nov 7, 2025

/retest

1 similar comment
@kaovilai
Copy link
Member Author

/retest

@weshayutin
Copy link
Contributor

ugh.. AI reviews :)
So.. I'm all for having the option to run w/ the container... HOWEVER..
We need to retain the ability to run from source or a pr from the must-gather repo. This was of course much easier when must-gather lived in the operator repo.

If this not running the must-gather atm in ci? I'll look

@kaovilai
Copy link
Member Author

Feedback is to build from source of a must-gather repo clone/local-dir so that must-gather can be developed on and tested in e2e w/o having to push an image to registry.

@kaovilai kaovilai marked this pull request as draft November 11, 2025 22:35
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use must-gather from new repo

5 participants