flaky test - wait-for-host-standby: retry puppeteer.launch with diagnostics#4872
Merged
Conversation
Boxel CLI Tests on CI failed because puppeteer.launch timed out after the default 30s waiting for Chrome's DevTools WS endpoint URL to appear in stdout. The existing retry loop only covered page.goto and waitForFunction — a slow Chrome cold start aborted the script before the loop could even run. Wrap the launch in its own retry helper: - explicit 90s launch timeout per attempt (default was 30s) - up to 3 attempts with a 2s backoff between them, capped by the same 10-minute total deadline as the navigation loop - structured log lines for each attempt (executable path, args, dumpio, timeout, success/failure timing) so a future flake is debuggable from the CI log alone - on the final attempt, enable puppeteer's dumpio so Chrome's own stdout/stderr is piped through node — if launch still fails after the retries, we capture *why* (sandbox denial, missing shared lib, GPU init crash) instead of a bare "Timed out … waiting for the WS endpoint URL" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Wraps puppeteer.launch in wait-for-host-standby.ts with its own retry loop and a 90s per-attempt timeout to mitigate CI flakiness where Chrome cold-starts exceed Puppeteer's default 30s launch timeout before the existing post-launch retry loop ever runs.
Changes:
- Add
launchBrowserWithRetryhelper with up to 3 attempts, 2s backoff, and a 90s timeout per attempt, bounded by the existing 10-minute total deadline. - Enable Puppeteer
dumpioon the final attempt (and wheneverWAIT_FOR_HOST_STANDBY_VERBOSE=1) to surface Chrome's own stderr when launch ultimately fails. - Replace the single inline
puppeteer.launchcall inmain()with the new retry helper.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Contributor
Host Test Results 1 files 1 suites 1h 29m 31s ⏱️ Results for commit 4af133e. Realm Server Test Results 1 files 1 suites 8m 26s ⏱️ Results for commit 4af133e. For more details on these errors, see this check. |
2 tasks
jurgenwerk
approved these changes
May 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Boxel CLI Tests on PR #4863 failed with this in the dev-stack startup:
puppeteer.launchtimed out before the script's existing retry loop could even start. The retry loop only covered the post-launchpage.goto+waitForFunctionphases. A slow Chrome cold start on a loaded CI runner aborts the whole script, which fails theStart dev stackstep, which fails the job — and Chrome cold-start time is a well-known flakiness vector on CI.What this PR changes
packages/realm-server/scripts/wait-for-host-standby.ts: wrappuppeteer.launchin its own retry helper.dumpioso Chrome's own stdout/stderr is piped through node — if launch still fails after the retries, we capture why (sandbox denial, missing shared library, GPU init crash) instead of a bare "Timed out … waiting for the WS endpoint URL"The
WAIT_FOR_HOST_STANDBY_VERBOSE=1env var also now forcesdumpio: trueon every attempt for local repro.What this does NOT do
BrowserManagerin the prerender server itself. That path also uses the puppeteer default 30s launch timeout, but it runs under a long-lived service with its own restart/recovery story — out of scope for this PR.page.goto/waitForFunctionis unchanged.Test plan