Skip to content

Increase e2e wait timeouts to reduce flakiness#4545

Open
Okabe-Junya wants to merge 1 commit into
actions:masterfrom
Okabe-Junya:Okabe-Junya/fix/e2e-flaky-timeouts
Open

Increase e2e wait timeouts to reduce flakiness#4545
Okabe-Junya wants to merge 1 commit into
actions:masterfrom
Okabe-Junya:Okabe-Junya/fix/e2e-flaky-timeouts

Conversation

@Okabe-Junya

@Okabe-Junya Okabe-Junya commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

What

Raise wait timeouts in test/actions.github.com/helper.sh to reduce e2e flakiness. Each is overridable via an env var (defaults shown)

Why

(gha) E2E Tests is flaky on master (~half of recent push runs fail). The common cause is short, fixed timeouts on operations whose latency depends on the GitHub Actions service and runner startup. Each wait exits as soon as its condition is met, so raising the ceilings only helps slow cases and does not slow normal runs.

e.g.,

@Okabe-Junya Okabe-Junya marked this pull request as ready for review June 28, 2026 13:18
@Okabe-Junya Okabe-Junya requested a review from mumoshu as a code owner June 28, 2026 13:18
Copilot AI review requested due to automatic review settings June 28, 2026 13:18
@Okabe-Junya

Copy link
Copy Markdown
Contributor Author

not sure this PR will fix ALL flaky e2e tests, but let's see after merging

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce flakiness in the Actions-hosted E2E workflow by increasing upper-bound wait limits in the E2E helper script, while keeping runs fast when conditions are met quickly.

Changes:

  • Increase the scale set pod readiness wait timeout from 30s to 120s (env-overridable).
  • Increase the autoscaling runner set cleanup wait timeout from 40s to 120s (env-overridable).
  • Increase the workflow-run start wait budget from ~60s to ~180s by raising the retry ceiling (env-overridable).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/actions.github.com/helper.sh Outdated
Comment thread test/actions.github.com/helper.sh Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

Comment thread test/actions.github.com/helper.sh
Comment thread test/actions.github.com/helper.sh Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

Comment thread test/actions.github.com/helper.sh
Comment thread test/actions.github.com/helper.sh
Comment thread test/actions.github.com/helper.sh
@Okabe-Junya Okabe-Junya force-pushed the Okabe-Junya/fix/e2e-flaky-timeouts branch from 9d33f4a to 43b32ca Compare June 28, 2026 13:44
@nikola-jokic

Copy link
Copy Markdown
Collaborator

Hey @Okabe-Junya,

We should start adding larger runner machines, but increasing test timeouts is still good if you ask me so we should merge this PR anyway ☺️ Thanks!

@Okabe-Junya

Copy link
Copy Markdown
Contributor Author

Thank you!

We should start adding larger runner machines

Yes, would be nice to consider this! Can we use specialized machines (provided by the community or GitHub company) on github.com/actions? Or is it better to start with a large runner?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants