feat(gastown): batch gastown improvements — landing MR loop fix, env var propagation, container tooling, triage dispatch, dead code cleanup, UI polish by jrf0110 · Pull Request #2374 · Kilo-Org/cloud

jrf0110 · 2026-04-13T17:22:43Z

Summary

Batch merge of 9 PRs from gastown-staging into main, covering dispatch reliability, container env propagation, container image tooling, dead code removal, and UI improvements.

Bug Fixes

Break create_landing_mr infinite loop (#2371) — Fixed the reconciler retrying landing MR creation every 5s without a circuit breaker (one town accumulated 5,335 actions over 41 hours). Added deduplication, max attempts with exponential cooldown (capped at 30min), PR URL validation guard, and a race condition fix. New FailConvoy action type for explicit convoy failure with reason tracking.
Propagate custom env_vars to running containers on settings save (#2366) — Custom environment variables set in town settings were not reaching already-running container processes until restart + re-dispatch. Three gaps fixed: syncTownConfigToProcessEnv() now writes custom vars to process.env (with cleanup tracking), syncConfigToContainer() persists custom vars to DO storage, and updateAgentModel() hot-swap overlays fresh custom vars over stale startup snapshots.
Prevent triage batch bead dispatch loop with wrong system prompt (#2321) — Two-layer fix: (A) mark triage batch bead as in_progress before startAgentInContainer() to prevent Rule 2 re-dispatch with generic polecat prompt, and (B) defense-in-depth detection in dispatchAgent to inject the correct triage system prompt even if Rule 2 fires.

Features

Add Java JDK to container images (#2066) — Install default-jdk (OpenJDK) in both prod and dev Dockerfiles to support Java project builds and runtime.
Add cmake and pkg-config to container images (#2060) — Install cmake and pkg-config in both prod and dev Dockerfiles for native dependency builds.
Add town ID copy badge and Debug settings section (#2296) — Copyable town ID badge in the Overview page sticky top bar, plus a new "Debug" section in Town Settings with a "Copy debug info" button that serializes a sanitized JSON snapshot (tokens/emails omitted, env var keys only).
Add dismiss actions for failed beads on Merge Queue page (#2295) — Individual dismiss button per failed bead row, bulk "Dismiss all failed" button with concurrent close + loading/toast feedback, and a view button fallback fix for orphaned MR beads.

Cleanup

Remove dead code from patrol/scheduling/review-queue (#2339) — Removed unused GUPP_WARN_MS export, updated stale comments referencing removed functions (witnessPatrol, schedulePendingWork, processReviewQueue, recoverStuckReviews) to describe the current reconciler/alarm-based architecture.
Remove dead popReviewQueue and update stale comments (#2318) — Deleted dead popReviewQueue() and its Town.do.ts wrapper, removed dead completeReview() wrapper, updated stale comments across review-queue, reconciler, beads, and container-dispatch modules.

Verification

All 9 constituent PRs were individually reviewed and merged to gastown-staging
Typecheck and tests passed on each PR at merge time
Constituent PRs linked above contain detailed per-change verification

Visual Changes

Copyable town ID badge in Overview page top bar (#2296)
Debug settings section with "Copy debug info" button in Town Settings (#2296)
Dismiss (X) button on individual failed bead rows and bulk "Dismiss all failed" button on Merge Queue page (#2295)

Reviewer Notes

The 9 PRs were staged on gastown-staging to batch test interactions before landing on main
The create_landing_mr loop fix (fix(gastown): break create_landing_mr infinite loop (#2260) #2371) stores retry metadata (landing_mr_attempts, last_landing_mr_attempt_at) in the convoy bead's existing metadata JSON column — no schema migration needed
The env var propagation fix (fix(gastown): propagate custom env_vars to running containers on settings save #2366) uses module-level Set tracking in control-server.ts (resets on container restart, which is fine since process.env also resets) and DO durable storage for cross-restart persistence
Custom env var keys that collide with infra keys (GIT_TOKEN, KILOCODE_TOKEN, etc.) are silently skipped to maintain infra precedence
The debug JSON output intentionally excludes PII (emails, tokens show as boolean _set fields only)

#2295) * feat(merges): add dismiss actions for failed beads on Merge Queue page - Add individual Dismiss (X) button to each failed bead row in AttentionItemRow - Add bulk 'Dismiss all failed (N)' button to NeedsAttention header area - Fix view button fallback: open MR bead when sourceBead is null (orphaned beads) - Both individual and bulk dismiss call updateBead with status: 'closed' on the MR bead - Dismiss all shows loading spinner and toast on completion/error * fix(merges): use rigId directly in openDrawer to fix TS typecheck --------- Co-authored-by: John Fawcett <john@kilcoode.ai>

* feat(gastown): add town ID copy badge and Debug settings section * fix(gastown): sanitize debug payload — strip git_url credentials and replace git_author_name with presence flag --------- Co-authored-by: John Fawcett <john@kilcoode.ai>

…2318) * chore(gastown): remove dead popReviewQueue and update stale comments Remove popReviewQueue() from review-queue.ts and its wrapper from Town.do.ts — confirmed no callers in the tRPC router, reconciler, or anywhere else. Also remove the Town.do.ts completeReview() wrapper (also had no external callers) and update stale comments across review-queue.ts, Town.do.ts, reconciler.ts, beads.ts, and container-dispatch.ts that referenced old patrol/scheduling functions (feedStrandedConvoys, rehookOrphanedBeads, schedulePendingWork, recoverStuckReviews, witnessPatrol, processReviewQueue) to reflect the current reconciler-based architecture. Closes #1403 * ci: retrigger Kilo Code Review (previous run failed due to transient clone error) * test(gastown): update integration tests to remove removed popReviewQueue/completeReview APIs popReviewQueue() and completeReview() were removed from TownDO as dead code. Update integration tests to use listBeads({ type: 'merge_request' }) instead of popReviewQueue() to observe MR bead state, and remove the regression guard test for completeReview which is no longer testable via the TownDO public API. --------- Co-authored-by: John Fawcett <john@kilcoode.ai>

…em prompt (#2321) * fix(gastown): prevent triage batch bead dispatch loop with wrong system prompt Option A: Mark triage batch bead as in_progress immediately after hookBead() in maybeDispatchTriageAgent(), before awaiting startAgentInContainer(). This prevents reconciler Rule 2 (idle agent + open hooked bead → dispatch_agent) from re-dispatching the triage bead with the generic polecat prompt on the next tick when container start fails. Rule 3 (stale in_progress, 5-min timeout) resets it to open for a clean retry via maybeDispatchTriageAgent. Option B (defense-in-depth): In applyActionCtx.dispatchAgent, detect triage batch beads (gt:triage label + created_by='patrol') and inject the triage system prompt, ensuring the polecat gets the correct tools even if Rule 2 somehow fires against an open triage batch bead. Fixes #1958 * fix(gastown): set rig_id on triage batch bead so reconciler Rule 1 can re-dispatch after timeout Without rig_id, when Rule 3 resets an abandoned in_progress triage batch bead to 'open', Rule 1 skips it (requires rig_id IS NOT NULL). This left the bead permanently 'open', blocking maybeDispatchTriageAgent from creating a replacement. Setting rig_id ensures Rule 1 can re-dispatch the existing bead (with triage system prompt injected via Option B). --------- Co-authored-by: John Fawcett <john@kilcoode.ai>

Add remaining build-essentials packages (cmake, pkg-config) to both prod and dev Dockerfiles. build-essential and libssl-dev were already present.

Install default-jdk (OpenJDK) in both prod and dev Dockerfiles to support Java project builds and runtime.

…ings save (#2366) * fix(gastown): propagate custom env_vars to running containers on settings save Three gaps fixed: 1. syncTownConfigToProcessEnv() now applies custom env_vars from town config to process.env, with tracking of previously-applied keys so removed vars are deleted from process.env. 2. syncConfigToContainer() now persists custom env_vars to TownContainerDO storage (via container.setEnvVar/deleteEnvVar) so they survive container restarts. Previously-persisted custom keys are tracked in DO storage and cleaned up on removal. 3. updateAgentModel() hot-swap now overlays fresh custom env_vars from getCurrentTownConfig() over the stale startupEnv snapshot. Infra keys in LIVE_ENV_KEYS always take precedence. * fix(gastown): guard custom env_vars against reserved key override - control-server: export getLastAppliedEnvVarKeys() for process-manager - process-manager: delete stale custom keys from hotSwapEnv on hot-swap - Town.do: skip RESERVED_ENV_KEYS when setting custom env_vars on container Addresses 3 review warnings about custom env_vars overriding infra keys. * fix: skip reserved env keys in prevCustomKeys cleanup loop prevCustomKeys may contain reserved keys persisted by the previous implementation (before the RESERVED_ENV_KEYS filter was added). Without this guard the cleanup loop would delete managed infra values like KILOCODE_TOKEN that were just written by envMapping. --------- Co-authored-by: John Fawcett <john@kilcoode.ai>

…1403) (#2339) chore(gastown): remove dead GUPP_WARN_MS export and update stale patrol/queue comments - Remove unused GUPP_WARN_MS constant export from patrol.ts (never referenced outside the file) - Update completion-reporter.ts JSDoc: replace stale witnessPatrol/schedulePendingWork references with reconciler-based description - Update control-server.ts comments: replace stale processReviewQueue/recoverStuckReviews references with current TownDO terminology Part of issue #1403 dead code cleanup. Co-authored-by: John Fawcett <john@kilcoode.ai>

* fix(gastown): break create_landing_mr infinite loop (#2260) Add circuit breaker for landing MR creation to prevent runaway retry loops when convoys have no PR URLs. A town accumulated 5,335 failed actions over 41 hours before this fix. - Fix 1: Deduplicate MR bead creation — skip if an open/in_progress landing MR already exists for the convoy - Fix 2: Max 5 landing MR attempts with exponential cooldown (30s base, 30min cap), fail the convoy when exhausted - Fix 3: PR URL validation guard — skip landing MR creation when no tracked beads have a pr_url - Fix 4: Move convoy fail check before update_convoy_progress to prevent the race where progress updates are emitted for convoys about to be failed/closed Store landing_mr_attempts and last_landing_mr_attempt_at in the convoy bead's metadata JSON field (no schema migration needed). Add FailConvoy action type for explicit convoy failure. * fix(gastown): move max-attempts check after landing MR status lookup The max landing MR attempts guard was firing before checking whether the final landing MR was still active or already merged, making the last allowed attempt impossible to succeed. Now we check landing MR status first and only fail the convoy when no landing MR is active or merged. --------- Co-authored-by: John Fawcett <john@kilcoode.ai>

kilo-code-bot · 2026-04-13T17:32:52Z

services/gastown/src/dos/town/reconciler.ts

-          // If a landing MR is active (open or in_progress), wait for it
-          const hasActiveLanding = landingMrs.some(
-            mr => mr.status === 'open' || mr.status === 'in_progress'
+        if ((convoyBeadsWithPr[0]?.cnt ?? 0) === 0) {


CRITICAL: Final landing is blocked for direct-merge convoys

This guard requires at least one tracked bead MR to have a pr_url, but review-then-land intermediate steps in merge_strategy = 'direct' are merged straight into the convoy feature branch and never persist a PR URL. Once those beads finish, the convoy stays open forever because we keep skipping create_landing_mr.

kilo-code-bot · 2026-04-13T17:32:52Z

services/gastown/container/src/control-server.ts

+  }
+  // Apply current custom env vars, skipping reserved keys.
+  for (const [key, value] of Object.entries(customEnvVars)) {
+    if (RESERVED_ENV_KEYS.has(key)) continue;


WARNING: Custom env vars can overwrite runtime routing vars

/sync-config now copies arbitrary env_vars into process.env, but RESERVED_ENV_KEYS does not include runtime-owned keys like GASTOWN_TOWN_ID and GASTOWN_RIG_ID. A town-level env var with one of those names will clobber the values that the pending-nudge routes and plugin client read from process.env, which breaks callbacks for every running agent in the container.

kilo-code-bot · 2026-04-13T17:32:52Z

services/gastown/container/src/process-manager.ts

+  const freshCustomKeySet = new Set<string>();
+  if (freshEnvVars !== null && typeof freshEnvVars === 'object' && !Array.isArray(freshEnvVars)) {
+    for (const [key, value] of Object.entries(freshEnvVars as Record<string, unknown>)) {
+      if (LIVE_ENV_KEYS.has(key)) continue;


WARNING: Hot-swap can still override reserved infra env vars

This overlay only skips LIVE_ENV_KEYS, not the full reserved set that buildAgentEnv and /sync-config protect. A custom env var like GASTOWN_API_URL or GASTOWN_SESSION_TOKEN will be copied into hotSwapEnv here during model updates and can restart the SDK server with broken callback credentials or URLs.

kilo-code-bot · 2026-04-13T17:33:20Z

Code Review Summary

Status: 3 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	1
WARNING	2
SUGGESTION	0

Issue Details (click to expand)

CRITICAL

File	Line	Issue
`services/gastown/src/dos/town/reconciler.ts`	1878	Direct-merge `review-then-land` convoys can never create the final landing MR because the new guard requires a tracked bead `pr_url`.

WARNING

File	Line	Issue
`services/gastown/container/src/control-server.ts`	150	`/sync-config` now allows custom env vars to overwrite runtime-owned keys like `GASTOWN_TOWN_ID` and `GASTOWN_RIG_ID`.
`services/gastown/container/src/process-manager.ts`	1277	Model hot-swap still overlays reserved infra env vars from custom `env_vars`, which can restart agents with broken callback credentials or URLs.

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File	Line	Issue
`N/A`	N/A	No additional observations.

Files Reviewed (18 files)

apps/web/src/app/(app)/gastown/[townId]/TownOverviewPageClient.tsx - 0 issues
apps/web/src/app/(app)/gastown/[townId]/merges/NeedsAttention.tsx - 0 issues
apps/web/src/app/(app)/gastown/[townId]/settings/TownSettingsPageClient.tsx - 0 issues
services/gastown/container/Dockerfile - 0 issues
services/gastown/container/Dockerfile.dev - 0 issues
services/gastown/container/src/completion-reporter.ts - 0 issues
services/gastown/container/src/control-server.ts - 1 issue
services/gastown/container/src/process-manager.ts - 1 issue
services/gastown/src/dos/Town.do.ts - 0 issues
services/gastown/src/dos/town/actions.ts - 0 issues
services/gastown/src/dos/town/beads.ts - 0 issues
services/gastown/src/dos/town/container-dispatch.ts - 0 issues
services/gastown/src/dos/town/patrol.ts - 0 issues
services/gastown/src/dos/town/reconciler.ts - 1 issue
services/gastown/src/dos/town/review-queue.ts - 0 issues
services/gastown/test/integration/review-failure.test.ts - 0 issues
services/gastown/test/integration/rig-alarm.test.ts - 0 issues
services/gastown/test/integration/rig-do.test.ts - 0 issues

Fix these issues in Kilo Cloud

_{Reviewed by gpt-5.4-20260305 · 3,866,793 tokens}

jrf0110 and others added 9 commits April 10, 2026 15:32

feat(gastown): add cmake and pkg-config to container images (#2060)

10f8065

Add remaining build-essentials packages (cmake, pkg-config) to both prod and dev Dockerfiles. build-essential and libssl-dev were already present.

feat(gastown): add Java JDK to container images (#2066)

afef7d5

Install default-jdk (OpenJDK) in both prod and dev Dockerfiles to support Java project builds and runtime.

kilo-code-bot bot reviewed Apr 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gastown): batch gastown improvements — landing MR loop fix, env var propagation, container tooling, triage dispatch, dead code cleanup, UI polish#2374

feat(gastown): batch gastown improvements — landing MR loop fix, env var propagation, container tooling, triage dispatch, dead code cleanup, UI polish#2374
jrf0110 wants to merge 9 commits intomainfrom
gastown-staging

jrf0110 commented Apr 13, 2026

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot Apr 13, 2026

Uh oh!

kilo-code-bot bot commented Apr 13, 2026 •

edited

Loading

CRITICAL

WARNING

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jrf0110 commented Apr 13, 2026

Summary

Bug Fixes

Features

Cleanup

Verification

Visual Changes

Reviewer Notes

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Overview

CRITICAL

WARNING

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kilo-code-bot bot commented Apr 13, 2026 •

edited

Loading