Skip to content

Feature/kiloclaw admin cli admin view and schema#2376

Open
evanjacobson wants to merge 52 commits intomainfrom
feature/kiloclaw-admin-cli_admin-view-and-schema
Open

Feature/kiloclaw admin cli admin view and schema#2376
evanjacobson wants to merge 52 commits intomainfrom
feature/kiloclaw-admin-cli_admin-view-and-schema

Conversation

@evanjacobson
Copy link
Copy Markdown
Contributor

@evanjacobson evanjacobson commented Apr 13, 2026

Summary

Recover with Kilo was missing the basic safeguards and visibility needed to be a reliable support workflow: admins could not cleanly initiate recovery runs for users, concurrent starts could race, and users had no dependable way to get back to an in-progress run instead of accidentally starting another. This phase 1 tightens that loop so recovery runs behave like a single tracked operation per instance and are visible from the places operators and users already expect.

This change adds admin-initiated Kilo CLI recovery runs, persists run metadata so status can be correlated across app and controller state, and surfaces in-progress runs from the user flow via the existing button path. It follows the existing KiloClaw patterns of instance-scoped routing, DB-backed operational metadata, admin audit logging, and controller/platform conflict mapping rather than introducing a new recovery-specific workflow shape.

Verification

  • tests pass
  • manually tested

Visual Changes

Reviewer Notes

  • A follow-up PR will address the same behaviors for org instances, in addition to other fixes for orgs. Orgs are currently non functional with this flow regardless, and this PR was large as-is, so I am scoping that work separately.
Known Limitations

  • A CLI run can be "orphaned" if the DB insert fails, which will block the user from starting a run until the one active on the machine clears. I chose not to scope this into this PR as it is a very rare edge case, and the CLI runs are quick.
  • There are known issues with orgs that I will address in a follow up

…acking

    - Add initiated_by column to kiloclaw_cli_runs (admin/user) with migration
    - Create shared createCliRun util in lib/kiloclaw/cli-runs.ts
    - Use createCliRun in kiloclaw-router, organization-kiloclaw-router, admin router
    - Update cancelKiloCliRun to mark admin runs as cancelled in DB
    - Add initiatedBy filter to listAllCliRuns query and UI
    - Show Admin badge in CLI runs list for admin-initiated runs
    - Remove unused audit log and scrubPrompt from admin router

# Conflicts:
#	packages/db/src/migrations/meta/0074_snapshot.json
#	packages/db/src/migrations/meta/_journal.json
Return the created run id from admin-started CLI runs and use it to scope admin status and cancel flows to a single persisted row. Also extract shared CLI cancellation persistence so personal, org, and admin paths update terminal state consistently.
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Apr 13, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 0
Issue Details (click to expand)

WARNING

File Line Issue
apps/web/src/lib/kiloclaw/cli-runs.ts 180 Successful cancellation still does not update historical instance-scoped rows when the caller has no active personal instance
apps/web/src/lib/kiloclaw/cli-runs.ts 243 Terminal controller status still does not persist for historical instance-scoped rows in the same fallback path
Other Observations (not in diff)

None.

Files Reviewed (4 files)
  • apps/web/src/lib/kiloclaw/cli-runs.ts - 2 issues
  • apps/web/src/lib/kiloclaw/instance-registry.ts - 0 issues
  • apps/web/src/lib/kiloclaw/cli-runs.test.ts - 0 issues
  • apps/web/src/routers/kiloclaw-router.test.ts - 0 issues

Fix these issues in Kilo Cloud


Reviewed by gpt-5.4-20260305 · 465,695 tokens

@@ -0,0 +1,2 @@
ALTER TABLE "kiloclaw_cli_runs" ADD COLUMN "initiated_by_admin_id" text;--> statement-breakpoint
ALTER TABLE "kiloclaw_cli_runs" ADD CONSTRAINT "kiloclaw_cli_runs_initiated_by_admin_id_kilocode_users_id_fk" FOREIGN KEY ("initiated_by_admin_id") REFERENCES "public"."kilocode_users"("id") ON DELETE set null ON UPDATE no action; No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Validating this foreign key can block writes

Adding a foreign key on an existing populated table validates every existing row while holding a lock that conflicts with inserts and updates on kiloclaw_cli_runs. Consider adding the constraint as NOT VALID and validating it in a separate step so deploys do not stall CLI run writes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[bot] verify/fix in schema.ts and I will regenerate.

Copy link
Copy Markdown
Contributor Author

@evanjacobson evanjacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting pending review to unblock replies.

const { data } = await kiloCliRunHistory.refetch();
const latestRun = selectCurrentCliRun(data?.runs, status.instanceId);
if (latestRun?.status === 'running') {
router.push(`/claw/kilo-cli-run/${latestRun.id}`);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a run in progress, clicking the Recover with Kilo button will directly route into the run view.

This UX will be improved in a follow up soon after this PR, so this is the MVP as far as providing a decent UX with the enforcement of "one run at a time"

/>
<div className="flex items-center gap-3">
<Button variant="ghost" size="sm" onClick={() => router.push('/claw')}>
<Button variant="ghost" size="sm" onClick={() => router.push('/claw/settings')}>
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Routes the user back to the settings page instead of the chat page

onOpenChange(false);
router.push(`/claw/kilo-cli-run/${data.id}`);
},
onError: err => {
Copy link
Copy Markdown
Contributor Author

@evanjacobson evanjacobson Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[bot] we can probably streamline this.

...trpc.admin.kiloclawInstances.getKiloCliRunStatus.queryOptions(
selectedRunId
? { userId, instanceId, runId: selectedRunId }
: { userId, instanceId, runId: '00000000-0000-0000-0000-000000000000' }
Copy link
Copy Markdown
Contributor Author

@evanjacobson evanjacobson Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[bot] why are we manually setting runId here?

<DetailField label="Status">
<Badge
className={
runStatus.status === 'running'
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4x nested ternary is the most I've personally witnessed.

.notNull()
.references(() => kilocode_users.id, { onDelete: 'cascade' }),
instance_id: uuid().references(() => kiloclaw_instances.id),
initiated_by_admin_id: text().references(() => kilocode_users.id, { onDelete: 'set null' }),
Copy link
Copy Markdown
Contributor Author

@evanjacobson evanjacobson Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to kiloclaw_cli_runs

const parsed = StartRunBodySchema.safeParse(body);
if (!parsed.success) {
return c.json(
{ error: 'Invalid request body', details: z.treeifyError(parsed.error) },
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error.flatten() is deprecated in favor of z.treeifyError(error)

const status = (err as { status: number }).status;
if (status >= 400 && status < 600) return status;
// Extract a valid HTTP status from the error or its cause, defaulting to 500.
for (const candidate of [err, err instanceof Error ? err.cause : undefined]) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a refactor for the sake of clarity

typeof (err as { code?: unknown }).code === 'string'
? (err as { code: string }).code
: undefined;
: typeof err === 'object' &&
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[bot] can we leverage a helper method here? Or at least explain to me what's happening

- Revert unnecessary InstanceControls componentization
- Add undefined guard on createCliRun row insert
- Document baseProcedure choice on listKiloCliRuns
- Add comment explaining placeholder UUID in KiloCliRunCard
- Extract getErrorCode helper in platform.ts to replace deep as-cast chain
</div>
{run.initiated_by_admin_id && (
<p className="text-muted-foreground text-xs">
Initiated by {run.initiated_by_admin_email ?? 'Admin'}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful info at a glance for us internally. Will not be available for users.

await markCliRunCancelled({
runId: params.runId,
userId: params.userId,
instanceId: params.instanceId,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Successful cancel will not update historical instance-scoped rows

When the caller has no active personal instance, cancelCliRun() can now find a destroyed-instance run via the relaxed lookup, but params.instanceId is still null. markCliRunCancelled() treats null as instance_id IS NULL, so this update misses the matched row and leaves it marked running in Postgres even after the worker confirms cancellation.

await persistCliRunControllerStatus({
runId: params.runId,
userId: params.userId,
instanceId: params.instanceId,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Terminal status still won't persist for historical runs

The same params.instanceId value is reused here after loading a destroyed-instance row. In the no-active-instance fallback that value remains null, so persistCliRunControllerStatus() updates only instance_id IS NULL rows and skips the matched historical run. That leaves the DB row stuck at running after the controller reports completion or failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant