Skip to content

Diagnostic tracing and levers for dev session hot reload (#2417)#7337

Open
MitchLillie wants to merge 2 commits intomainfrom
ml/fix-dev-session-hot-reload-2417
Open

Diagnostic tracing and levers for dev session hot reload (#2417)#7337
MitchLillie wants to merge 2 commits intomainfrom
ml/fix-dev-session-hot-reload-2417

Conversation

@MitchLillie
Copy link
Copy Markdown
Contributor

@MitchLillie MitchLillie commented Apr 16, 2026

WHY are these changes introduced?

Investigating shop/issues-admin-extensibility#2417 — dev session hot reloading permanently stops working after ~12-15 file changes. GCS uploads and websocket messages both stop, even though vite rebuilds and admin extension builds keep succeeding. Requires a restart to recover.

This PR adds tracing and diagnostic levers to isolate where the pipeline permanently breaks. No behavioral changes.

WHAT is this pull request doing?

Build ID tracing

Every file change gets a monotonic [build:N] ID that traces through --verbose output at every stage:

[build:5] onChange: 1 event(s): file_updated:index.html
[build:5] processing 1 extension event(s): changed:admin
[build:5] buildExtensions complete
[build:5] post-build steps complete
[build:5] emitting 'all'
[build:5] DevSession.onEvent received
[build:5] validate: VALID (1 events)
[build:5] processEvents: starting bundleExtensionsAndUpload
[build:5] processEvents: result=updated
[build:5] devUIExtensions eventHandler received (1 events)
[build:5] devUIExtensions: payload store updated

When the issue occurs, the gap between the last trace line and the missing next one reveals exactly what hung/threw/died.

Diagnostic levers (env vars)

Each lever tests a theory about what permanently breaks:

Env var What it does Theory it tests
DEV_SKIP_GENERATE_TYPES=1 Skip generateExtensionTypes() after build This call hangs and permanently blocks emit('all') — every subsequent onChange chain piles up behind it
DEV_SKIP_RESCAN_IMPORTS=1 Skip rescanImports() (which can restart chokidar) A watcher restart during vite mid-write re-globs an empty ./dist, permanently losing the admin extension's watched files
DEV_SERIALIZE_ONCHANGE=1 Serialize onChange handlers with a promise chain Concurrent .then() chains corrupt shared state (this.app, bundle dir), causing a permanent break
DEV_UPLOAD_TIMEOUT_MS=15000 Add a timeout to bundleExtensionsAndUpload A hung GCS upload or API call permanently blocks the SerialBatchProcessor

If a lever prevents the issue, that theory is confirmed. If none do, the tracing output will show us something new.

Stress test script

scripts/dev-session-stress-test.sh — toggles source strings in a hosted app at a configurable interval to trigger repeated rebuilds.

How to test

1. Get a snapshot build:

/snapit

2. In a hosted app directory (Preact template with home/), start dev with tracing:

shopify app dev --verbose 2>&1 | tee /tmp/dev-trace.log

3. In another terminal, start the stress test:

# From the CLI repo:
./scripts/dev-session-stress-test.sh /path/to/your/app 5 10

4. Wait for it to break (~12-15 changes), then check the trace:

# Find where the last successful build stopped
grep "\[build:" /tmp/dev-trace.log | tail -20

5. Test levers one at a time to isolate the cause:

# Does skipping generateExtensionTypes prevent the break?
DEV_SKIP_GENERATE_TYPES=1 shopify app dev --verbose 2>&1 | tee /tmp/skip-types.log

# Does serializing onChange prevent it?
DEV_SERIALIZE_ONCHANGE=1 shopify app dev --verbose 2>&1 | tee /tmp/serialize.log

# Does skipping rescanImports prevent it?
DEV_SKIP_RESCAN_IMPORTS=1 shopify app dev --verbose 2>&1 | tee /tmp/skip-rescan.log

# Does adding an upload timeout reveal a hang?
DEV_UPLOAD_TIMEOUT_MS=15000 shopify app dev --verbose 2>&1 | tee /tmp/timeout.log

Checklist

  • I've considered possible cross-platform impacts (Mac, Linux, Windows)
  • I've considered possible documentation changes
  • I've considered analytics changes to measure impact
  • The change is user-facing, so I've added a changelog entry with pnpm changeset add

@MitchLillie MitchLillie requested a review from a team as a code owner April 16, 2026 21:19
@MitchLillie
Copy link
Copy Markdown
Contributor Author

/snapit

@github-actions
Copy link
Copy Markdown
Contributor

🫰✨ Thanks @MitchLillie! Your snapshot has been published to npm.

Test the snapshot by installing your package globally:

pnpm i -g --@shopify:registry=https://registry.npmjs.org @shopify/cli@0.0.0-snapshot-20260416212105

Caution

After installing, validate the version by running shopify version in your terminal.
If the versions don't match, you might have multiple global instances installed.
Use which shopify to find out which one you are running and uninstall it.

@MitchLillie MitchLillie changed the title Ml/fix dev session hot reload 2417 Fix dev session updates silently stopping after transient errors Apr 16, 2026
Tracing: every file change gets a monotonic [build:N] ID that traces
through the full pipeline in --verbose output. Logs at every stage:
onChange → handleWatcherEvents → buildExtensions → generateExtensionTypes
→ emit('all') → DevSession.onEvent → validateAppEvent → processEvents
→ bundleExtensionsAndUpload → result

Levers (env vars to isolate the permanent failure in #2417):

  DEV_SKIP_GENERATE_TYPES=1
    Skip generateExtensionTypes() after build. Tests if this call
    hangs and permanently blocks emit('all').

  DEV_SKIP_RESCAN_IMPORTS=1
    Skip rescanImports() which can restart the file watcher. Tests
    if a watcher restart during vite mid-write permanently loses
    the admin extension's watched files.

  DEV_SERIALIZE_ONCHANGE=1
    Serialize onChange handlers with a mutex. Tests if concurrent
    .then() chains corrupting shared state (this.app, bundle dir)
    cause the permanent break.

  DEV_UPLOAD_TIMEOUT_MS=15000
    Add a timeout to bundleExtensionsAndUpload. Tests if a hung
    GCS upload or API call permanently blocks the SerialBatchProcessor.
@MitchLillie MitchLillie force-pushed the ml/fix-dev-session-hot-reload-2417 branch from 0c62648 to 62660b2 Compare April 16, 2026 21:44
Toggles source code strings in a hosted app every N seconds to trigger
repeated vite rebuilds. Used alongside the diagnostic levers to isolate
the permanent failure in #2417.
@MitchLillie MitchLillie changed the title Fix dev session updates silently stopping after transient errors Diagnostic tracing and levers for dev session hot reload (#2417) Apr 16, 2026
@MitchLillie
Copy link
Copy Markdown
Contributor Author

/snapit

@github-actions
Copy link
Copy Markdown
Contributor

🫰✨ Thanks @MitchLillie! Your snapshot has been published to npm.

Test the snapshot by installing your package globally:

pnpm i -g --@shopify:registry=https://registry.npmjs.org @shopify/cli@0.0.0-snapshot-20260416214923

Caution

After installing, validate the version by running shopify version in your terminal.
If the versions don't match, you might have multiple global instances installed.
Use which shopify to find out which one you are running and uninstall it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant