Skip to content

feat: verifier-gated model router (run cheap → verify → escalate)#854

Draft
anandgupta42 wants to merge 1 commit into
mainfrom
feat/verifier-gated-router
Draft

feat: verifier-gated model router (run cheap → verify → escalate)#854
anandgupta42 wants to merge 1 commit into
mainfrom
feat/verifier-gated-router

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented May 31, 2026

What does this PR do?

Adds a verifier-gated model router: run a cheap model first, verify the result deterministically (dbt build), and escalate to a stronger model only when verification fails. Flag-gated via ALTIMATE_ROUTER (default off) — the normal single-model run path is byte-for-byte unchanged.

  • packages/opencode/src/router/verifier.ts — deterministic Verdict from dbt build/dbt test. Spoof-proof (parses the last summary + uses exitCode as backstop, so a model-emitted fake Done. PASS=… ERROR=0 cannot pass the gate). Pluggable Impl; honest unverifiable state when a gate can't run.
  • packages/opencode/src/router/router.ts — the escalation ladder. Runs each tier, verifies, escalates on a failed verdict with the exact failing checks as context, stops at the first pass. Per-tier exception handling (a transient tier error escalates instead of aborting). Ladder overridable via ALTIMATE_ROUTER_LADDER.
  • packages/opencode/src/router/policy.ts — where the ladder comes from: a static default, or a per-context policy fetched from the altimate API when ALTIMATE_API_KEY is set (with AbortSignal timeouts + sanitizeTiers validation/capping; degrades to static on any failure).
  • packages/opencode/src/router/verdict.ts — a machine-checkable verdict envelope (schemaVersion, per-attempt history, checks, evidence hash, optional signature).
  • packages/opencode/src/cli/cmd/run.ts — orchestration when the flag is on. Only routes verifiable (dbt) workspaces; a non-dbt project runs once with the user's own model (no silent downgrade). dbt build runs with a hard timeout.

The customer routing policy endpoint lives in altimate-backend (separate PR). This PR ships the client + the static default.

Type of change

  • New feature (non-breaking change which adds functionality)

Issue for this PR

Closes #853

How did you verify your code works?

  • 51 unit tests (test/router/*.test.ts), including adversarial cases: dbt summary-line injection, ANSI/huge/multi-summary output, endpoint response validation + cost-bomb capping, and per-tier exception escalation.
  • 3 env-gated E2E suites (test/router/*.e2e.test.ts) run with real dependencies (no mocks): real dbt build in a container (incl. a spoof-attack that fails to fool the gate), real OpenRouter model calls with real escalation, and real-network policy fallback + adversarial endpoint responses.
  • tsgo --noEmit typecheck clean on all changed files; marker check clean.
  • Live CLI smoke (ALTIMATE_ROUTER=1 on a real dbt project): routed to the cheap tier → real agent run → real dbt build verify → verdict emitted, no escalation needed.
  • Reviewed via multi-model consensus (/consensus:code-review); all CRITICAL/MAJOR findings (non-dbt gating, external-call timeouts, per-tier exception handling, honest unverifiable verdict, endpoint hardening) applied before opening.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Draft — execute()-level follow-ups (reuse one session across tiers; finally cleanup of per-tier listeners/tracer) tracked for a follow-up before marking ready

Summary by cubic

Adds a verifier‑gated model router that runs a cheap model first, verifies with dbt build, and escalates only on failure. Flag-gated via ALTIMATE_ROUTER (default off); non‑dbt projects keep the existing single‑model flow.

  • New Features

    • Router: tier ladder (cheap → strong), verify after each run, escalate on failed verdict with failing checks as context; stops at first pass.
    • Verifier: parses the last dbt summary and checks exitCode; reports failing nodes; returns unverifiable when the gate can’t run.
    • Policy: static default ladder or Altimate API policy when ALTIMATE_API_KEY is set; validated and capped; falls back to static on any failure.
    • Verdict envelope: machine-checkable record (schema version, attempts, checks, evidence hash, optional signature).
    • CLI: routing only in verifiable dbt workspaces; non‑dbt runs once; dbt build has a 300s hard timeout.
    • Env: ALTIMATE_ROUTER, ALTIMATE_ROUTER_LADDER, ALTIMATE_API_KEY, ALTIMATE_API_URL.
  • Migration

    • No breaking changes; routing is off by default.
    • Enable in dbt projects with ALTIMATE_ROUTER=1.
    • Optional: set ALTIMATE_API_KEY to use the Altimate policy and outcome reporting.
    • Optional: override the ladder via ALTIMATE_ROUTER_LADDER (comma-separated provider/model).

Written for commit fd63be9. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • New Features

    • Verifier-gated router orchestration in the run command enables multi-tier policy routing with workspace verification.
    • Integrated dbt-based workspace verification to validate outcomes deterministically.
    • Escalation logic threads failure context across tiers to guide subsequent attempts.
    • Verdict envelopes capture and report routing outcomes.
    • Support for both API-driven and static routing policies.
  • Documentation

    • Added comprehensive router module documentation.
  • Tests

    • Added extensive unit and end-to-end test coverage for router components.

Run a cheap model first, verify the result deterministically (`dbt build`), and
escalate to a stronger model only when verification fails. Flag-gated
(`ALTIMATE_ROUTER`), default off — the normal single-model path is unchanged.

- `router/verifier.ts` — deterministic `Verdict` from `dbt build`/`dbt test`;
  spoof-proof (last-summary + exitCode backstop); pluggable `Impl`; honest
  `unverifiable` state when a gate cannot run
- `router/router.ts` — escalation ladder; per-tier exception handling so a
  transient tier failure escalates instead of aborting; env ladder override
- `router/policy.ts` — static default ladder or an altimate-API-served policy
  (key-gated, `AbortSignal` timeouts, `sanitizeTiers` validation + cap)
- `router/verdict.ts` — verdict envelope (schemaVersion, evidence hash, signer seam)
- `cli/cmd/run.ts` — orchestrator: routes only verifiable (dbt) workspaces;
  non-dbt projects run once with the user's model (no silent downgrade)
- 51 unit tests + 3 env-gated E2E suites (real dbt / real model / real network)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 31, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4c5d45f3-993c-4dba-ab1b-37645d7029a5

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/verifier-gated-router

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
packages/opencode/src/router/README.md (1)

31-35: ⚡ Quick win

Consider documenting the dbt build timeout.

The integration description is accurate but could mention that dbt build runs with a 300-second hard timeout to prevent hung builds from stalling the router. This timeout is a user-facing constraint that could affect runs in large dbt projects.

📝 Suggested addition
 ## Integration
 `src/cli/cmd/run.ts` (`RunCommand`): when `Router.enabled()`, the run resolves a policy,
 runs each tier by re-invoking the existing run path with that model (escalation note
-prepended) in the same workspace, verifies with `dbt build` between tiers, and emits a
-verdict envelope. The default (non-router) path is untouched.
+prepended) in the same workspace, verifies with `dbt build` (300s timeout) between tiers,
+and emits a verdict envelope. The default (non-router) path is untouched.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/router/README.md` around lines 31 - 35, Update the
Integration section to mention that when Router.enabled() triggers RunCommand
(src/cli/cmd/run.ts) the intermediate dbt build verification step is executed
with a hard 300-second timeout; explicitly state that dbt build runs are limited
to 300 seconds to avoid hung builds stalling the router and that this is a
user-facing constraint for large dbt projects.
packages/opencode/test/router/verifier.e2e.test.ts (1)

21-31: ⚖️ Poor tradeoff

Prefer the tmpdir() fixture over raw mkdtempSync.

Temp-dir creation/cleanup here is hand-rolled and tracked in a module-level dirs array. The repo convention is to create temp dirs via the shared fixture with automatic cleanup, which would also let you drop the dirs/afterAll bookkeeping.

As per coding guidelines: "Use the tmpdir function from fixture/fixture.ts to create temporary directories for tests with automatic cleanup" and "Always use await using syntax with tmpdir() for automatic cleanup".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/router/verifier.e2e.test.ts` around lines 21 - 31, The
project helper currently hand-rolls temp dirs with mkdtempSync and a
module-level dirs array; change it to use the shared tmpdir fixture with
automatic cleanup by converting the helper to accept (or be called with) the
tmpdir() fixture and using "await using tmpdir()" in the test instead of
mkdtempSync, remove the dirs bookkeeping/afterAll cleanup, and create the
dbt_project.yml, profiles.yml, models directory and files inside the provided
fixture path; reference the project(...) helper to accept a fixture path
parameter (or inline its logic in the test using tmpdir()) so all temporary
directories are managed by fixture/fixture.ts.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/test/router/router.e2e.test.ts`:
- Around line 68-70: The cleanup in the afterAll block currently runs a
privileged `sudo rm -rf` via Bun.spawnSync which is unnecessary and dangerous;
remove the Bun.spawnSync invocation and keep the safe Node removal using
rmSync(d, { recursive: true, force: true }) (still wrapped in the existing
try/catch). Update the afterAll cleanup that iterates over dirs so it only calls
rmSync for each d (and no longer calls Bun.spawnSync or any sudo command),
referencing the existing afterAll block, the dirs variable, and rmSync call to
locate the change.
- Around line 44-58: The test's fetch call does not check the HTTP response and
can silently fall back to a default SQL via extractSql when the API returns an
error; update the POST response handling in the router.e2e.test.ts fetch block
to verify res.ok, and if not ok read and include the response status and body
(text or json) in a thrown error so the test fails loudly instead of using the
fallback "select 1 as id"; ensure this check happens before parsing j and before
calling extractSql so failures from the API (auth, rate limits, invalid model)
surface immediately.
- Around line 62-67: The beforeAll currently throws when env prerequisites
(KEY/IMG) or the Docker image check (Bun.spawnSync) fail, causing CI failures;
change the test harness to skip the entire suite instead of throwing by gating
the describe — detect missing OPENROUTER_API_KEY (KEY) or E2E_IMG (IMG) or
missing image via Bun.spawnSync and call describe.skip (or wrap the describe in
a conditional) so the suite is skipped when prerequisites aren't present; locate
the existing beforeAll/describe in router.e2e.test.ts and update the logic
around KEY, IMG and Bun.spawnSync to skip rather than throw.

In `@packages/opencode/test/router/verifier.e2e.test.ts`:
- Around line 38-40: The cleanup block uses Bun.spawnSync(["sudo", "rm", "-rf",
d]) which is unnecessary and dangerous; remove the privileged delete call and
rely solely on the existing rmSync(d, { recursive: true, force: true }) inside
the afterAll cleanup (the function referencing afterAll, dirs, Bun.spawnSync and
rmSync), preserving the try/catch to handle errors and keeping use of
mkdtempSync-created dirs as the source of these temp paths.
- Around line 33-37: The suite currently throws inside beforeAll when the env
var IMG (E2E_IMG) is unset, which fails CI; instead gate the whole suite using
the test runner's conditional skip (e.g., wrap the top-level describe in
describe.skipIf(!IMG) or use describe(…) with conditional describe.skip) so the
suite is skipped when IMG/E2E_IMG is not provided; remove the unconditional
throw in beforeAll and keep the docker image existence check there only when IMG
is present (use Bun.spawnSync as currently done) to fail fast only for explicit
runs.

In `@packages/opencode/test/router/verifier.test.ts`:
- Around line 81-84: The test "ANSI color codes around the summary do not break
parsing" currently uses plain text in the `ansi` variable so it doesn't actually
verify ANSI handling; update the test in `verifier.test.ts` to include real
escape sequences (e.g. use `\x1b` or `\u001b` sequences like "\x1b[31m" /
"\x1b[0m" wrapped around the summary substring) so that Verifier.parseDbtSummary
is exercised with actual ANSI color codes and still returns pass=12; reference
the `Verifier.parseDbtSummary` call and the `ansi` variable when modifying the
test string.

---

Nitpick comments:
In `@packages/opencode/src/router/README.md`:
- Around line 31-35: Update the Integration section to mention that when
Router.enabled() triggers RunCommand (src/cli/cmd/run.ts) the intermediate dbt
build verification step is executed with a hard 300-second timeout; explicitly
state that dbt build runs are limited to 300 seconds to avoid hung builds
stalling the router and that this is a user-facing constraint for large dbt
projects.

In `@packages/opencode/test/router/verifier.e2e.test.ts`:
- Around line 21-31: The project helper currently hand-rolls temp dirs with
mkdtempSync and a module-level dirs array; change it to use the shared tmpdir
fixture with automatic cleanup by converting the helper to accept (or be called
with) the tmpdir() fixture and using "await using tmpdir()" in the test instead
of mkdtempSync, remove the dirs bookkeeping/afterAll cleanup, and create the
dbt_project.yml, profiles.yml, models directory and files inside the provided
fixture path; reference the project(...) helper to accept a fixture path
parameter (or inline its logic in the test using tmpdir()) so all temporary
directories are managed by fixture/fixture.ts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 67cd8f4e-abeb-4a8e-90bb-4057a97ade1e

📥 Commits

Reviewing files that changed from the base of the PR and between a490bd4 and fd63be9.

📒 Files selected for processing (13)
  • packages/opencode/src/cli/cmd/run.ts
  • packages/opencode/src/router/README.md
  • packages/opencode/src/router/policy.ts
  • packages/opencode/src/router/router.ts
  • packages/opencode/src/router/verdict.ts
  • packages/opencode/src/router/verifier.ts
  • packages/opencode/test/router/policy.e2e.test.ts
  • packages/opencode/test/router/policy.test.ts
  • packages/opencode/test/router/router.e2e.test.ts
  • packages/opencode/test/router/router.test.ts
  • packages/opencode/test/router/verdict.test.ts
  • packages/opencode/test/router/verifier.e2e.test.ts
  • packages/opencode/test/router/verifier.test.ts

Comment on lines +44 to +58
const res = await fetch(`${OR}/chat/completions`, {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${KEY}` },
body: JSON.stringify({
model: apiModel,
messages: [
{ role: "system", content: "You are a dbt engineer. Output ONLY the SQL for the requested model in a ```sql code block. No prose, no schema.yml." },
{ role: "user", content: task + (note ? `\n\nA PREVIOUS ATTEMPT FAILED VERIFICATION:\n${note}` : "") },
],
max_tokens: 600,
temperature: 0,
}),
})
const j: any = await res.json()
const sql = extractSql(j?.choices?.[0]?.message?.content ?? "select 1 as id")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

realRunAgent swallows HTTP/API errors and silently writes a fallback model.

There's no res.ok check; on a non-200 (rate limit, auth failure, invalid model) j?.choices?.[0]?.message?.content is undefined and the test silently writes select 1 as id. That can make an escalation test pass for the wrong reason. Surface the failure instead.

🛡️ Fail loudly on a bad response
   const j: any = await res.json()
+  if (!res.ok || !j?.choices?.[0]?.message?.content)
+    throw new Error(`OpenRouter call failed (${res.status}): ${JSON.stringify(j).slice(0, 200)}`)
   const sql = extractSql(j?.choices?.[0]?.message?.content ?? "select 1 as id")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const res = await fetch(`${OR}/chat/completions`, {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${KEY}` },
body: JSON.stringify({
model: apiModel,
messages: [
{ role: "system", content: "You are a dbt engineer. Output ONLY the SQL for the requested model in a ```sql code block. No prose, no schema.yml." },
{ role: "user", content: task + (note ? `\n\nA PREVIOUS ATTEMPT FAILED VERIFICATION:\n${note}` : "") },
],
max_tokens: 600,
temperature: 0,
}),
})
const j: any = await res.json()
const sql = extractSql(j?.choices?.[0]?.message?.content ?? "select 1 as id")
const res = await fetch(`${OR}/chat/completions`, {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${KEY}` },
body: JSON.stringify({
model: apiModel,
messages: [
{ role: "system", content: "You are a dbt engineer. Output ONLY the SQL for the requested model in a
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/router/router.e2e.test.ts` around lines 44 - 58, The
test's fetch call does not check the HTTP response and can silently fall back to
a default SQL via extractSql when the API returns an error; update the POST
response handling in the router.e2e.test.ts fetch block to verify res.ok, and if
not ok read and include the response status and body (text or json) in a thrown
error so the test fails loudly instead of using the fallback "select 1 as id";
ensure this check happens before parsing j and before calling extractSql so
failures from the API (auth, rate limits, invalid model) surface immediately.

Comment on lines +62 to +67
beforeAll(() => {
if (!KEY) throw new Error("OPENROUTER_API_KEY required for router E2E")
if (!IMG) throw new Error("E2E_IMG not set — provide a docker image with dbt-duckdb")
if (Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" }).exitCode !== 0)
throw new Error(`image ${IMG} missing`)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Same CI-fail pattern: skip instead of throwing when OPENROUTER_API_KEY/E2E_IMG are absent.

Static analysis confirms this suite throws (OPENROUTER_API_KEY required) in the pipeline. Gate the describe so the suite is skipped, not failed, when the opt-in prerequisites aren't present.

🐛 Skip when prerequisites are missing
-beforeAll(() => {
-  if (!KEY) throw new Error("OPENROUTER_API_KEY required for router E2E")
-  if (!IMG) throw new Error("E2E_IMG not set — provide a docker image with dbt-duckdb")
-  if (Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" }).exitCode !== 0)
-    throw new Error(`image ${IMG} missing`)
-})
+beforeAll(() => {
+  if (Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" }).exitCode !== 0)
+    throw new Error(`image ${IMG} missing`)
+})

And gate the suite (Line 72):

-describe("Router × REAL OpenRouter + REAL dbt (no mocks)", () => {
+const describeE2E = KEY && IMG ? describe : describe.skip
+describeE2E("Router × REAL OpenRouter + REAL dbt (no mocks)", () => {
🧰 Tools
🪛 GitHub Check: TypeScript

[failure] 63-63: error: OPENROUTER_API_KEY required for router E2E

  at <anonymous> (/home/runner/work/altimate-code/altimate-code/packages/opencode/test/router/router.e2e.test.ts:63:73)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/router/router.e2e.test.ts` around lines 62 - 67, The
beforeAll currently throws when env prerequisites (KEY/IMG) or the Docker image
check (Bun.spawnSync) fail, causing CI failures; change the test harness to skip
the entire suite instead of throwing by gating the describe — detect missing
OPENROUTER_API_KEY (KEY) or E2E_IMG (IMG) or missing image via Bun.spawnSync and
call describe.skip (or wrap the describe in a conditional) so the suite is
skipped when prerequisites aren't present; locate the existing
beforeAll/describe in router.e2e.test.ts and update the logic around KEY, IMG
and Bun.spawnSync to skip rather than throw.

Comment on lines +68 to +70
afterAll(() => {
for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {}
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop sudo rm -rf here as well.

Same concern as verifier.e2e.test.ts: the privileged delete is redundant with rmSync and a dangerous pattern. Temp dirs from mkdtempSync don't need elevated removal.

🛡️ Remove the privileged delete
-afterAll(() => {
-  for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {}
-})
+afterAll(() => {
+  for (const d of dirs) try { rmSync(d, { recursive: true, force: true }) } catch {}
+})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
afterAll(() => {
for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {}
})
afterAll(() => {
for (const d of dirs) try { rmSync(d, { recursive: true, force: true }) } catch {}
})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/router/router.e2e.test.ts` around lines 68 - 70, The
cleanup in the afterAll block currently runs a privileged `sudo rm -rf` via
Bun.spawnSync which is unnecessary and dangerous; remove the Bun.spawnSync
invocation and keep the safe Node removal using rmSync(d, { recursive: true,
force: true }) (still wrapped in the existing try/catch). Update the afterAll
cleanup that iterates over dirs so it only calls rmSync for each d (and no
longer calls Bun.spawnSync or any sudo command), referencing the existing
afterAll block, the dirs variable, and rmSync call to locate the change.

Comment on lines +33 to +37
beforeAll(() => {
if (!IMG) throw new Error("E2E_IMG not set — provide a docker image with dbt-duckdb")
const ok = Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" })
if (ok.exitCode !== 0) throw new Error(`E2E image ${IMG} not present`)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Throwing in beforeAll when E2E_IMG is unset fails CI instead of skipping.

Static analysis confirms this suite is throwing in the pipeline (E2E_IMG not set). An opt-in, infra-dependent suite should skip when its prerequisite env var is missing, not hard-fail the run. Gate the describe with describe.skipIf.

🐛 Skip the suite when the image is not provided
-beforeAll(() => {
-  if (!IMG) throw new Error("E2E_IMG not set — provide a docker image with dbt-duckdb")
-  const ok = Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" })
-  if (ok.exitCode !== 0) throw new Error(`E2E image ${IMG} not present`)
-})
+beforeAll(() => {
+  const ok = Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" })
+  if (ok.exitCode !== 0) throw new Error(`E2E image ${IMG} not present`)
+})

And gate the suite (Line 42):

-describe("Verifier × REAL dbt (no mocks)", () => {
+const describeE2E = IMG ? describe : describe.skip
+describeE2E("Verifier × REAL dbt (no mocks)", () => {
🧰 Tools
🪛 GitHub Check: TypeScript

[failure] 34-34: error: provide a docker image with dbt-duckdb
provide a docker image with dbt-duckdb
at (/home/runner/work/altimate-code/altimate-code/packages/opencode/test/router/verifier.e2e.test.ts:34:86)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/router/verifier.e2e.test.ts` around lines 33 - 37, The
suite currently throws inside beforeAll when the env var IMG (E2E_IMG) is unset,
which fails CI; instead gate the whole suite using the test runner's conditional
skip (e.g., wrap the top-level describe in describe.skipIf(!IMG) or use
describe(…) with conditional describe.skip) so the suite is skipped when
IMG/E2E_IMG is not provided; remove the unconditional throw in beforeAll and
keep the docker image existence check there only when IMG is present (use
Bun.spawnSync as currently done) to fail fast only for explicit runs.

Comment on lines +38 to +40
afterAll(() => {
for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {}
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Drop sudo rm -rf from cleanup.

Bun.spawnSync(["sudo", "rm", "-rf", d]) requires passwordless sudo, is redundant with the following rmSync, and sudo rm -rf is a dangerous footgun if d is ever empty/unexpected. The directories come from mkdtempSync, so the plain rmSync is sufficient.

🛡️ Remove the privileged delete
-afterAll(() => {
-  for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {}
-})
+afterAll(() => {
+  for (const d of dirs) try { rmSync(d, { recursive: true, force: true }) } catch {}
+})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
afterAll(() => {
for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {}
})
afterAll(() => {
for (const d of dirs) try { rmSync(d, { recursive: true, force: true }) } catch {}
})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/router/verifier.e2e.test.ts` around lines 38 - 40, The
cleanup block uses Bun.spawnSync(["sudo", "rm", "-rf", d]) which is unnecessary
and dangerous; remove the privileged delete call and rely solely on the existing
rmSync(d, { recursive: true, force: true }) inside the afterAll cleanup (the
function referencing afterAll, dirs, Bun.spawnSync and rmSync), preserving the
try/catch to handle errors and keeping use of mkdtempSync-created dirs as the
source of these temp paths.

Comment on lines +81 to +84
test("ANSI color codes around the summary do not break parsing", () => {
const ansi = "01:00:00 Done. PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12"
expect(Verifier.parseDbtSummary(ansi)?.pass).toBe(12)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

ANSI test uses no ANSI codes — it doesn't exercise the claim.

ansi is plain text, so this passes trivially regardless of whether the parser handles color codes (see related concern in verifier.ts Line 66-73). Add real escape sequences to make this meaningful.

💚 Inject real ANSI sequences
   test("ANSI color codes around the summary do not break parsing", () => {
-    const ansi = "01:00:00  Done. PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12"
+    const ansi = "01:00:00  \x1b[32mDone.\x1b[0m PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12"
     expect(Verifier.parseDbtSummary(ansi)?.pass).toBe(12)
   })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test("ANSI color codes around the summary do not break parsing", () => {
const ansi = "[0m01:00:00 Done. PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12[0m"
expect(Verifier.parseDbtSummary(ansi)?.pass).toBe(12)
})
test("ANSI color codes around the summary do not break parsing", () => {
const ansi = "01:00:00 \x1b[32mDone.\x1b[0m PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12"
expect(Verifier.parseDbtSummary(ansi)?.pass).toBe(12)
})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/test/router/verifier.test.ts` around lines 81 - 84, The
test "ANSI color codes around the summary do not break parsing" currently uses
plain text in the `ansi` variable so it doesn't actually verify ANSI handling;
update the test in `verifier.test.ts` to include real escape sequences (e.g. use
`\x1b` or `\u001b` sequences like "\x1b[31m" / "\x1b[0m" wrapped around the
summary substring) so that Verifier.parseDbtSummary is exercised with actual
ANSI color codes and still returns pass=12; reference the
`Verifier.parseDbtSummary` call and the `ansi` variable when modifying the test
string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: verifier-gated model router (run cheap → verify → escalate)

1 participant