Add E2E test retry jobs for PRs to handle flakiness#56581
Draft
cipolleschi wants to merge 5 commits intomainfrom
Draft
Add E2E test retry jobs for PRs to handle flakiness#56581cipolleschi wants to merge 5 commits intomainfrom
cipolleschi wants to merge 5 commits intomainfrom
Conversation
On PRs, E2E tests (iOS/Android, RNTester/TemplateApp) now retry up to 2 additional times on failure. Each retry runs on a fresh runner to address environment-level flakiness. - Original E2E jobs use `continue-on-error` on PRs so failures don't block the workflow - Step-level outcome is captured as a job output to trigger retries - Retry jobs only run on `pull_request` events - On `main`, behavior is unchanged: `continue-on-error` is false and the existing `rerun-failed-jobs` mechanism handles retries - Added `overwrite: true` to artifact uploads in maestro composite actions so retry jobs don't fail on duplicate artifact names
Move each E2E test job into its own reusable workflow with an internal `report` job that reliably captures the test result across all matrix combinations. This eliminates the matrix output race condition from the previous approach and reduces test-all.yml by ~690 lines. Each reusable workflow: - Runs the matrix E2E tests in a `test` job - Has a non-matrix `report` job that checks `needs.test.result` and exposes a `status` output (success/failure) The callers in test-all.yml are now ~5 lines each instead of ~30-90.
- Remove the PR-only guard from retry jobs so they also run on main and stable branches, providing consistent retry behavior everywhere - Simplify rerun-failed-jobs to only handle Fantom tests, since E2E retries are now handled by the in-workflow retry_1/retry_2 jobs
Set minimal `contents: read` permissions to satisfy CodeQL security analysis requirements.
Set top-level `contents: read` to satisfy CodeQL requirements. The rerun-failed-jobs job gets a job-level override adding `actions: write` since it needs to trigger retry-workflow.yml.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
continue-on-erroron PRs so failures don't block the workflow; onmain, behavior is unchanged and the existingrerun-failed-jobsmechanism continues to workoverwrite: trueto artifact uploads in maestro composite actions so retry jobs don't conflict on artifact namesHow it works
retry_1triggers → if that fails →retry_2triggers. All havecontinue-on-errorso the workflow stays green.main:continue-on-errorisfalse, retry jobs are skipped (PR-only), andrerun-failed-jobshandles retries as before.Known limitation
Since these are matrix jobs (Debug/Release), the job output uses the last-to-complete matrix combination's value. If only one flavor fails and the passing one finishes last, the retry may not trigger. In the common flakiness pattern (environment-level issues), both flavors tend to be affected, so this works well in practice.
Changelog:
[Internal] - Add E2E test retry jobs for PRs
Test plan
main: existingrerun-failed-jobsbehavior is preserved (retry jobs are skipped)