Skip to content

chore: release - merge dev into main#1461

Merged
zbigniewsobiecki merged 8 commits into
mainfrom
dev
Jun 26, 2026
Merged

chore: release - merge dev into main#1461
zbigniewsobiecki merged 8 commits into
mainfrom
dev

Conversation

@zbigniewsobiecki

Copy link
Copy Markdown
Member

Automated release PR created by the release workflow.

Commits (8):

e67e8dcd fix(worker): disable setup.sh idle timeout, bump to Node 24, scope agent lint to changed files (#1460)
38f052ac fix(web): remove @theme inline so dark mode CSS tokens propagate correctly (#1459)
767dd948 feat(web): persist debug-analysis in-progress state and disable trigger button (MNG-1668) (#1458)
1ebc5661 docs(architecture): document cross-process debug-analysis status model (#1457)
f8088dc9 test(cli): cover debug-analysis --wait failed branch json + no-timeout (MNG-1669) (#1456)
48bdb263 fix(api): make debug-analysis status durable and cross-process (#1453)
f412c16d feat(web): show "Run is starting…" for ack-time work-item runs links (#1455)
284c6ffb feat(web): show "Run is starting…" pending state for fresh run links (#1454)

aaight and others added 8 commits June 25, 2026 15:13
…1454)

A freshly-shared /runs/<id> URL can point at a run row the worker has not
committed yet (the window between "URL shared in a GitHub/PM comment" and
"worker commits the run row"). Previously the page flashed a misleading
terminal "Run not found" immediately.

- Add shared RunPendingState component (Loader2 spinner + "Run is starting…"
  heading + subtext, optional message prop) for reuse by the work-item runs
  page in a follow-up story.
- Add NOT_FOUND-aware retry to the runs.getById query (retry only while the
  error is NOT_FOUND and within RUN_PENDING_MAX_RETRIES, retryDelay
  RUN_PENDING_POLL_MS) so the page self-heals to the real run within seconds.
- Replace the terminal "Run not found" branch with resolveRunDetailView()
  branching (pending / loading / not-found / error / ready); soften the
  not-found copy to acknowledge the run may have been cancelled/removed.
- Preserve the existing 5s refetchInterval while status === running.

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…1455)

* feat(web): show "Run is starting…" for ack-time work-item runs links

Work-item runs links (`/work-items/$projectId/$workItemId`) are posted at
ack time — before the worker creates the run row — so `workItems.runs`
returns `[]` and the page flashed a terminal "No runs found". Track mount
time, keep polling through the bounded grace window, and render the shared
`RunPendingState` while empty-within-grace.

- Route tracks `mountedAt = useRef(Date.now())` and computes `elapsedMs`.
- Replace the bespoke refetchInterval with the shared
  `workItemRunsRefetchInterval({ hasRunning, isEmpty, elapsedMs })` so the
  page polls through the ack-before-create window.
- Derive `isPending` from `resolveWorkItemRunsView(...)` and pass it into
  `WorkItemRunsTable`; the empty branch renders `RunPendingState` when
  pending, else keeps "No runs found" (default preserved for the PR page).
- Reuses the `run-pending.ts` helper + `RunPendingState` from the earlier
  stories — no new logic primitives.
- Doc: add a run-links note to docs/architecture/08-config-credentials.md
  about the transient "starting" state for links shared before the run row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(web): re-render work-item runs page at the pending grace boundary

The work-item runs page derived `isPending` from elapsed time during
render but had no driver to re-render once the grace window elapsed with
a still-empty list. React Query structural-shares the same empty-array
reference across polls and the component reads only data/isLoading/
isError/error, so no tracked prop changed after the first empty result —
leaving the page stuck on "Run is starting…" forever instead of
reverting to "No runs found".

Schedule a one-shot timer (a `useState` tick) for the moment the grace
window elapses while pending, so `isPending` is recomputed to false
exactly at the boundary and the table falls back to "No runs found".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* fix(api): make debug-analysis status durable and cross-process

In production the debug-analysis job runs in a separate worker process,
so the dashboard API's in-memory `isAnalysisRunning()` Set never reports
`running`. Derive running/queued/failed status from the durable BullMQ
dashboard-jobs queue (keyed by the deterministic `debug-analysis-<runId>`
id) when REDIS_URL is set, and keep the in-memory + DB path for local dev.

- getDebugAnalysisStatus: queue-mode precedence (in-flight job -> running
  short-circuiting the DB; DB record -> completed; failed job -> failed;
  else idle). Status union gains an additive `failed` literal.
- triggerDebugAnalysis: queue-mode already-running guard throws CONFLICT
  on an in-flight job; re-run removes any prior terminal job then
  re-enqueues with the deterministic id so it is idempotent and reusable.
- Read the REDIS_URL gate at call time (isQueueMode()) instead of a
  module-load const so it reflects the live env and is unit-testable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(api): make debug-analysis status reflect the analysis lifecycle, not spawn

The durable signal the previous approach relied on (the cascade-dashboard-jobs
BullMQ job) reaches `completed` at container *spawn*, not at analysis
completion: guardedSpawn resolves once `container.start()` returns, while the
debug agent runs for tens of seconds to minutes inside the spawned container and
the `debug_analyses` row is written only at the very end. So `running` was
observable only during the brief spawn window — during the actual analysis,
status read `idle` (frontend stopped polling and showed the empty "Run Analysis"
state) and the double-press guard lapsed (a second trigger re-enqueued a
duplicate: 2× LLM cost + duplicate PM comment).

Replace the job-state signal with a worker-owned, durable, cross-process
lifecycle signal that models the analysis itself:

- New `debug_analysis_status` table (PK on analyzed_run_id, ON DELETE CASCADE),
  migration 0056. The worker marks `running` around `triggerDebugAnalysis`,
  clears it on success (a present `debug_analyses` row is then `completed`), and
  marks `failed` on error. The dashboard also marks `running` at trigger time to
  cover the enqueue→spawn window.
- `getDebugAnalysisStatus` and the re-trigger guard now read this row uniformly
  in queue and local-dev mode — no more BullMQ job-state reads. A stale `running`
  row (crashed worker, > DEBUG_ANALYSIS_RUNNING_STALE_MS = 2h, above the 30 min
  worker timeout) is ignored so a crash never wedges the run permanently.
- The deterministic job id + remove-before-submit re-enqueue is kept so a
  near-simultaneous second trigger that slips past the guard cannot spawn a
  duplicate container. `running` is marked only after a successful enqueue so a
  failed enqueue cannot leave a phantom marker blocking retries.
- Remove the worker-local in-memory tracker (`debug-status.ts`) — never visible
  to the dashboard process — and the now-unused `getDashboardJobState` helper.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(cli): surface terminal failed status in debug-analysis --wait loop

The `cascade runs debug <id> --analyze --wait` poll loop only returned on
`completed` and `idle`. After the durable-status PR added a real `failed`
status, a failed analysis was no longer surfaced as `idle`, so the loop
ignored it and polled for the full 5-minute deadline before printing a
misleading "Timed out waiting for debug analysis to complete." message.

Add a `failed` branch mirroring the existing `idle` branch so a failed
analysis returns promptly with an accurate "Debug analysis failed."
message. Also corrects the imprecise comment in runs.ts: the leftover
terminal status row is overwritten by the markDebugAnalysisRunning upsert,
not deleted by deleteDebugAnalysisByRunId.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(debug): cover durable failed-write error path in debug analysis runner

Pins the defensive invariant in triggerDebugAnalysis that a failing
markDebugAnalysisFailed write logs a warning but does not mask or replace
the original agent error. This is the catch-handler branch that codecov
flagged as the uncovered patch lines in src/triggers/shared/debug-runner.ts.
Test-only; no production code change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(api): correct debug-analysis guard comment to reflect uniform durable read

The already-running guard comment claimed it was 'durable in queue mode,
in-memory in local dev' — leftover narrative from the abandoned approach.
The shipped guard (assertDebugAnalysisNotInFlight) reads the durable
debug_analysis_status row uniformly in both queue and local-dev mode; the
in-memory isAnalysisRunning mechanism was deleted. Aligns the comment with
the function's own docstring and the rest of the durable-table design.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(db): exercise debug_analysis_status ON DELETE CASCADE with a targeted delete

The ON DELETE CASCADE test used truncateAll(), which issues
`TRUNCATE ... CASCADE` and clears every referencing table regardless of
the FK action — so it would pass even if the constraint were
NO ACTION/RESTRICT, never actually exercising the clause it names. Delete
only the parent agent_runs row instead: the DELETE now succeeds and
removes the status row solely because the FK cascades; a non-cascading FK
would raise a foreign-key violation. Also assert the status row exists
before the delete to make the cascade observable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(debug): document failed-status coverage caveat for hard kills

The `failed` debug-analysis status is written only from the in-process
catch in triggerDebugAnalysis, so it covers catchable in-process errors
only. A hard kill (watchdog/OOM) or a throw before the runner is reached
(e.g. processDashboardJob failing to load the project config after the
dashboard already marked running) leaves a `running` row that self-stales
to `idle` rather than surfacing `failed`. Documents this deliberate
tradeoff in-code per repeated review feedback (non-blocking).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(debug): note getRunById-null early-return in failed-coverage caveat

The latest review observed that the runner's own early getRunById-null
return is another path where the dashboard-written 'running' marker is
never cleared or flipped to 'failed' (it self-stales to 'idle' after
DEBUG_ANALYSIS_RUNNING_STALE_MS). That early return is a bare return
before the try block, so the existing catch-block caveat did not cover
it. Document the gap at the early-return site and extend the catch-block
caveat to list it. Comment-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(db): document the durable debug_analysis_status table

Records the shipped cross-process debug-analysis status design (new
table from MNG-1667) in the database architecture doc: ER relationship,
key-tables row, and repository purpose. Captures the durable-table
approach that actually shipped, not the abandoned queue-derived design
the reviewer flagged, so the permanent in-repo record is accurate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(db): note failed-status coverage caveat on debug_analysis_status schema

Surface the reviewer's persistently-flagged caveat in the canonical table
definition: `failed` is written only for catchable in-process errors (the
debug-runner's `catch`), so a hard kill (watchdog/OOM) or a throw before the
runner is reached self-stales the `running` row to `idle` rather than
surfacing `failed`. Also names DEBUG_ANALYSIS_RUNNING_STALE_MS and points at
isDebugAnalysisRunActive so the staleness threshold is discoverable from the
schema. Comment-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(db): pin exact staleness boundary for isDebugAnalysisRunActive

Lock the strict `<` against DEBUG_ANALYSIS_RUNNING_STALE_MS so a future
refactor to `<=` (or back) is caught. Existing tests covered only a fresh
(~0ms) and a clearly-stale (threshold + 1s) row; this adds the boundary
itself — exactly at the threshold (stale) and one ms under (active) —
under a frozen clock with real-timer cleanup. That boundary is what
guarantees a crashed/OOM-killed worker never wedges a debug analysis as
permanently `running`, which the durable debug_analysis_status design
relies on.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* fix(worker): mark debug analysis failed on pre-runner config-load failure

When `processDashboardJob` cannot load the project config for a
`debug-analysis` job, it throws before `triggerDebugAnalysis` (whose
`catch` writes the durable `failed` status) is ever reached. The
dashboard has already marked the run `running` at trigger time, so the
`running` row would otherwise linger and self-stale to `idle` after
`DEBUG_ANALYSIS_RUNNING_STALE_MS` rather than surfacing `failed` —
blocking re-trigger with CONFLICT for that ~2h window.

Mark the run `failed` (best-effort, so a status-write error never masks
the original project-not-found failure) at that site, extending `failed`
coverage to this catchable in-process worker error. Hard kills
(watchdog/OOM) remain the documented deliberate follow-up (router-side
reconciliation on non-zero container exit).

Keeps the in-code coverage caveats accurate (debug-runner `catch` block
and the `debug_analysis_status` schema docstring) and adds worker-entry
tests asserting the run is marked failed and that the original error
still propagates when the failed-write itself fails.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(db): document debug-analysis status precedence and failed caveat

Record the shipped durable-table behavior in the merged architecture doc:
the getDebugAnalysisStatus precedence (active running → completed (DB-wins)
→ failed → idle, uniform in queue + local dev) and the failed-coverage
caveat (catchable in-process errors only; hard-kill/OOM self-stales the
running row to idle, with router-side reconciliation the deliberate
follow-up). Documentation-only; no behavior change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: cascade-bot <cascade-bot@users.noreply.github.com>
…t (MNG-1669) (#1456)

The `failed` branch in the `cascade runs debug --analyze --wait` poll loop
shipped in #1453 (commit 48bdb26), but two of MNG-1669's acceptance criteria
lacked dedicated regression coverage:

- AC4 ("returns on failed without waiting for the timeout"): the existing
  test only asserted the failure message. Add a test pinning that the loop
  polls exactly once and never prints the timeout message.
- AC3 ("--json output for the failed case is consistent with other terminal
  branches"): no test existed. Add a test asserting that under --json the
  failed branch logs the plain message, does not emit a JSON object
  (outputJson not called), and does not fetch analysis content — mirroring
  the idle/timeout branches.

Test-only; no production behavior change.

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
#1457)

Record the shipped durable, cross-process debug-analysis status model in
the architecture deep-dives and the trigger README, replacing the stale
"in-memory Set fallback" framing the work item inherited from an
abandoned design.

- 01-services.md: add "Cross-process debug-analysis status" subsection
  documenting the durable `debug_analysis_status` table read uniformly in
  queue + local dev, the writer/reader split, status precedence + 2h
  staleness self-heal, and the deterministic `debug-analysis-<runId>`
  dashboard job as a dedup (not status) mechanism.
- 03-trigger-system.md: expand the auto-debug bullet and add a
  "Debug-analysis status" subsection cross-linking the service + DB docs.
- src/triggers/README.md: note the durable status on the auto-debug row
  and add a focused subsection.

Aligns the docs with the durable-table design already documented in
09-database.md (MNG-1667), which deleted the worker-local in-memory
`debug-status.ts` Set because it was invisible to the dashboard process.

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…er button (MNG-1668) (#1458)

Keep the Debug Analysis panel's in-progress affordance and the disabled
trigger button visible from the moment the analysis is triggered until a
terminal status, instead of flickering back the instant the trigger
mutation settles.

- Extract pure decision helpers into web/src/lib/debug-analysis.ts
  (computeDebugAnalysisRunning, debugAnalysisRefetchInterval) so the
  polling/in-progress logic is node-testable.
- Poll the status query from trigger until a terminal status via a
  polling-active flag (set on trigger success, cleared on completed/failed)
  in addition to status === 'running'.
- isRunning now covers in-flight mutation + just-triggered + queued/running.
- Split the component into a data wrapper + presentational DebugAnalysisView
  so the rendered UX is render-testable; disable both Run / Re-run buttons
  while running and show a spinner affordance.
- Render an error for the failed status and re-enable the button; preserve
  the existing synchronous trigger-error handling.

Co-authored-by: Cascade Bot <bot@cascade.dev>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…ectly (#1459)

Tailwind v4's `@theme inline` bakes token values directly into utility
classes (e.g. bg-card → background-color: oklch(1 0 0)) instead of
emitting CSS custom properties. The .dark { --color-card: ... } overrides
had no effect on those utilities, leaving cards, tab bars, and other
token-backed components stuck at their light-mode colours in dark mode.

Removing `inline` makes Tailwind emit :root CSS custom properties so the
existing .dark { } block takes effect for every token-based utility across
the dashboard — bg-card, bg-muted, bg-input, bg-sidebar, border-border, etc.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ent lint to changed files (#1460)

* fix(worker): disable setup.sh idle timeout, bump to Node 24, scope agent lint to changed files

Three related fixes after a worker run where .cascade/setup.sh was killed
mid-Ruby-compilation and the agent then stalled on a full-tree eslint:fix:

1. setup.sh idle timeout: ruby-build suppresses make output by design,
   causing the 120s idle timeout to fire during compilation. Disable the
   idle timeout for setup.sh; the wall timeout (10min) remains the safety
   net for truly hung scripts. Also surface `reason` in the log so
   wall-timeout kills are distinguishable from idle-timeout kills.

2. Worker image Node 24: bump builder and production stages from
   node:22 to node:24, eliminating the EBADENGINE mismatch for projects
   that require Node >=24 (and reducing what setup.sh needs to install
   via asdf).

3. implementation.eta step 7: replace vague "run linting" instruction
   with an explicit changed-files-only lint pattern. Full-tree eslint --fix
   can take 10+ minutes on large TypeScript codebases and kills sessions.
   Also add a kill-on-timeout instruction to tmux.eta partial.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(prompts): include untracked files in scoped agent lint

Review feedback on PR #1460: the scoped-lint guidance in step 7 still
relies on `git diff --name-only HEAD`, which lists only tracked changes.
At step 7 the agent has not committed yet (commit happens inside CreatePR
at step 8) and nothing was `git add`ed, so brand-new source/test files
are untracked and get excluded from the lint scope. Local lint then
passes while the full-project `lint-and-test` CI job fails on exactly
those new files — the opposite of this step's intent.

Pick up untracked files via `git ls-files --others --exclude-standard`,
dedupe with `sort -u`, and use `xargs -r` so the linter is not invoked
with zero args when nothing matches. Note the same treatment applies to
the Biome/other-linter variant.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cascade Bot <bot@cascade.dev>
@zbigniewsobiecki zbigniewsobiecki merged commit da22a5b into main Jun 26, 2026
13 of 14 checks passed
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants