Gate the test-realm SW so it stops intercepting when unclaimed#5380
Open
lukemelia wants to merge 1 commit into
Open
Gate the test-realm SW so it stops intercepting when unclaimed#5380lukemelia wants to merge 1 commit into
lukemelia wants to merge 1 commit into
Conversation
The test-realm service worker relays http://test-realm/ browser fetches to the
window and 503s ("No responsive client available") when no client answers
within its 1500ms timeout. `unregister()` does not evict an already-active
worker from a still-loaded page, so once an image/audio module registers it, it
keeps controlling the QUnit runner into later modules. Those modules never
install a `test-realm-fetch` responder, so a boot-time fetch (e.g.
MatrixService.setSystemCard) gets a 503 instead of the expected 404 — boot
fails, the login form renders, and the module cascades or times out. It's
shard-composition-sensitive, which is why it surfaces intermittently.
Gate interception on an `active` flag: the worker defaults to passthrough and
only intercepts while a module has turned it on (in its beforeEach, after the
responder is listening), turning it back off on teardown. A lingering worker
then passes requests straight through, behaving exactly as if no SW were
installed — the state non-intercepting modules already expect and pass under.
The flag defaults off so a terminated-and-restarted worker fails safe toward
passthrough rather than resurrecting the leak; the owning module re-asserts it
each beforeEach.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Contributor
Author
|
[Claude Code 🤖] CI green: all 20 Host Test shards pass, including 10/12/13 which were the leaked-SW victims. Marking ready for review. |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR addresses intermittent packages/host acceptance-test shard failures caused by the test-realm service worker continuing to intercept http://test-realm/... requests after the module that registered it has finished, leading to unexpected 503 "No responsive client available" responses in later modules.
Changes:
- Adds an opt-in
activegate intest-realm-sw.jsso the SW defaults to passthrough unless explicitly enabled. - Updates the test helper to turn interception on in
beforeEach(after installing the responder) and turn it off inafter(beforeunregister()), using an ack’dpostMessage.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/host/tests/helpers/test-realm-service-worker.ts | Adds setSwActive() and uses it to enable/disable interception around test module lifetime. |
| packages/host/public/test-realm-sw.js | Introduces an in-memory active flag and a message handler to toggle interception; fetch handler now gates interception on active. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+44
to
+67
| async function setSwActive(active: boolean): Promise<void> { | ||
| let controller = navigator.serviceWorker.controller; | ||
| if (!controller) { | ||
| return; | ||
| } | ||
| await new Promise<void>((resolve) => { | ||
| let channel = new MessageChannel(); | ||
| let settled = false; | ||
| let finish = () => { | ||
| if (settled) { | ||
| return; | ||
| } | ||
| settled = true; | ||
| clearTimeout(timeout); | ||
| channel.port1.onmessage = null; | ||
| resolve(); | ||
| }; | ||
| let timeout = setTimeout(finish, 500); | ||
| channel.port1.onmessage = finish; | ||
| controller.postMessage({ type: 'test-realm-sw-set-active', active }, [ | ||
| channel.port2, | ||
| ]); | ||
| }); | ||
| } |
Comment on lines
+29
to
+32
| let port = event.ports && event.ports[0]; | ||
| if (port) { | ||
| port.postMessage({ ok: true, active }); | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Host test shards fail intermittently with whole modules failing at setup (
Promise rejected before … [object Object]) or hanging to a 60s timeout, with the app stuck on the login form. Recent examples on #5373 hittheme-card-test,fallback-models,show-card,workspace-chooser-delete, andworkspace-chooser archive.Root cause: the test-realm service worker leaks across modules
packages/host/public/test-realm-sw.jsintercepts every browser-level fetch tohttp://test-realm/…and relays it to the window viapostMessage, returning503 "No responsive client available"if nothing replies within its 1500ms relay timeout.The client-side responder (the
test-realm-fetchhandler intests/helpers/test-realm-service-worker.ts, viasetupTestRealmServiceWorker) is installed only by the ~13image-def/*andaudio-def/*acceptance modules.unregister()(called in that helper'shooks.after) does not evict an already-active worker from a still-loaded page — the active worker keeps controlling the QUnit runner until the page unloads. So once an image/audio module registers the SW, it lingers into later modules. A later module that does ahttp://test-realm/browser fetch but never installed a responder — e.g. boot-timeMatrixService.start → setSystemCard → store.get(http://test-realm/test/SystemCard/default)— gets a 503. Because a 503 (unlike the expected 404 for a missing SystemCard) is fatal to boot, the app renders the login form, and the module cascades or times out (60s waiting for operator-mode DOM that never appears).Confirmed from the failing logs:
No responsive client availableis emitted only attest-realm-sw.js:~29.IndexRootRoute.model.Acceptance | image def | opening an image URL in a new tab(registers SW) →Acceptance | workspace-chooser-delete(victim), i.e. the exact leaked-SW adjacency.relation "jobs" does not exist/worker-fatallines that also appear are shutdown-time noise (timestamped after tests finish); not causal.Why it's flaky / shard-composition-sensitive: it only bites when an image/audio module lands immediately before a non-SW module that fetches
test-realm/URLs in the same shard/browser session. Adding tests reshuffles shard boundaries, so a PR that adds tests (or a stale branch) can trigger it whilemaindoesn't.Fix
Gate interception on an
activeflag in the SW:beforeEach(after the responder is listening) and off inhooks.after(beforeunregister), each via a small ack'dpostMessage.test-realm/requests straight through — behaving exactly as if no SW were installed, which is the state non-intercepting modules already expect and pass under. No responder is invoked for them, so there is no hanging-fetch failure mode.beforeEach.Rejected alternative
An earlier attempt (on #5373, since reverted) made the responder always-on — installed globally and resolving the network at message time. That backfired: it called
virtualNetwork.fetch()for every leaked request, and for slow/not-found/torn-down fetches the promise never resolved, holding a test waiter sosettled()never completed — turning fast cascades into 243s timeouts and spreading failures from one shard to three. This PR takes the opposite tack (stop intercepting) rather than (always respond).Validation
Host tests can't be run locally (the
ember test --path distharness aborts at boot), so this is validated by CI on this branch. Lint + type-check pass locally.🤖 Generated with Claude Code