feat: verifier-gated model router (run cheap → verify → escalate)#854
feat: verifier-gated model router (run cheap → verify → escalate)#854anandgupta42 wants to merge 1 commit into
Conversation
Run a cheap model first, verify the result deterministically (`dbt build`), and escalate to a stronger model only when verification fails. Flag-gated (`ALTIMATE_ROUTER`), default off — the normal single-model path is unchanged. - `router/verifier.ts` — deterministic `Verdict` from `dbt build`/`dbt test`; spoof-proof (last-summary + exitCode backstop); pluggable `Impl`; honest `unverifiable` state when a gate cannot run - `router/router.ts` — escalation ladder; per-tier exception handling so a transient tier failure escalates instead of aborting; env ladder override - `router/policy.ts` — static default ladder or an altimate-API-served policy (key-gated, `AbortSignal` timeouts, `sanitizeTiers` validation + cap) - `router/verdict.ts` — verdict envelope (schemaVersion, evidence hash, signer seam) - `cli/cmd/run.ts` — orchestrator: routes only verifiable (dbt) workspaces; non-dbt projects run once with the user's model (no silent downgrade) - 51 unit tests + 3 env-gated E2E suites (real dbt / real model / real network) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
packages/opencode/src/router/README.md (1)
31-35: ⚡ Quick winConsider documenting the dbt build timeout.
The integration description is accurate but could mention that
dbt buildruns with a 300-second hard timeout to prevent hung builds from stalling the router. This timeout is a user-facing constraint that could affect runs in large dbt projects.📝 Suggested addition
## Integration `src/cli/cmd/run.ts` (`RunCommand`): when `Router.enabled()`, the run resolves a policy, runs each tier by re-invoking the existing run path with that model (escalation note -prepended) in the same workspace, verifies with `dbt build` between tiers, and emits a -verdict envelope. The default (non-router) path is untouched. +prepended) in the same workspace, verifies with `dbt build` (300s timeout) between tiers, +and emits a verdict envelope. The default (non-router) path is untouched.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/src/router/README.md` around lines 31 - 35, Update the Integration section to mention that when Router.enabled() triggers RunCommand (src/cli/cmd/run.ts) the intermediate dbt build verification step is executed with a hard 300-second timeout; explicitly state that dbt build runs are limited to 300 seconds to avoid hung builds stalling the router and that this is a user-facing constraint for large dbt projects.packages/opencode/test/router/verifier.e2e.test.ts (1)
21-31: ⚖️ Poor tradeoffPrefer the
tmpdir()fixture over rawmkdtempSync.Temp-dir creation/cleanup here is hand-rolled and tracked in a module-level
dirsarray. The repo convention is to create temp dirs via the shared fixture with automatic cleanup, which would also let you drop thedirs/afterAllbookkeeping.As per coding guidelines: "Use the
tmpdirfunction fromfixture/fixture.tsto create temporary directories for tests with automatic cleanup" and "Always useawait usingsyntax withtmpdir()for automatic cleanup".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/test/router/verifier.e2e.test.ts` around lines 21 - 31, The project helper currently hand-rolls temp dirs with mkdtempSync and a module-level dirs array; change it to use the shared tmpdir fixture with automatic cleanup by converting the helper to accept (or be called with) the tmpdir() fixture and using "await using tmpdir()" in the test instead of mkdtempSync, remove the dirs bookkeeping/afterAll cleanup, and create the dbt_project.yml, profiles.yml, models directory and files inside the provided fixture path; reference the project(...) helper to accept a fixture path parameter (or inline its logic in the test using tmpdir()) so all temporary directories are managed by fixture/fixture.ts.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/test/router/router.e2e.test.ts`:
- Around line 68-70: The cleanup in the afterAll block currently runs a
privileged `sudo rm -rf` via Bun.spawnSync which is unnecessary and dangerous;
remove the Bun.spawnSync invocation and keep the safe Node removal using
rmSync(d, { recursive: true, force: true }) (still wrapped in the existing
try/catch). Update the afterAll cleanup that iterates over dirs so it only calls
rmSync for each d (and no longer calls Bun.spawnSync or any sudo command),
referencing the existing afterAll block, the dirs variable, and rmSync call to
locate the change.
- Around line 44-58: The test's fetch call does not check the HTTP response and
can silently fall back to a default SQL via extractSql when the API returns an
error; update the POST response handling in the router.e2e.test.ts fetch block
to verify res.ok, and if not ok read and include the response status and body
(text or json) in a thrown error so the test fails loudly instead of using the
fallback "select 1 as id"; ensure this check happens before parsing j and before
calling extractSql so failures from the API (auth, rate limits, invalid model)
surface immediately.
- Around line 62-67: The beforeAll currently throws when env prerequisites
(KEY/IMG) or the Docker image check (Bun.spawnSync) fail, causing CI failures;
change the test harness to skip the entire suite instead of throwing by gating
the describe — detect missing OPENROUTER_API_KEY (KEY) or E2E_IMG (IMG) or
missing image via Bun.spawnSync and call describe.skip (or wrap the describe in
a conditional) so the suite is skipped when prerequisites aren't present; locate
the existing beforeAll/describe in router.e2e.test.ts and update the logic
around KEY, IMG and Bun.spawnSync to skip rather than throw.
In `@packages/opencode/test/router/verifier.e2e.test.ts`:
- Around line 38-40: The cleanup block uses Bun.spawnSync(["sudo", "rm", "-rf",
d]) which is unnecessary and dangerous; remove the privileged delete call and
rely solely on the existing rmSync(d, { recursive: true, force: true }) inside
the afterAll cleanup (the function referencing afterAll, dirs, Bun.spawnSync and
rmSync), preserving the try/catch to handle errors and keeping use of
mkdtempSync-created dirs as the source of these temp paths.
- Around line 33-37: The suite currently throws inside beforeAll when the env
var IMG (E2E_IMG) is unset, which fails CI; instead gate the whole suite using
the test runner's conditional skip (e.g., wrap the top-level describe in
describe.skipIf(!IMG) or use describe(…) with conditional describe.skip) so the
suite is skipped when IMG/E2E_IMG is not provided; remove the unconditional
throw in beforeAll and keep the docker image existence check there only when IMG
is present (use Bun.spawnSync as currently done) to fail fast only for explicit
runs.
In `@packages/opencode/test/router/verifier.test.ts`:
- Around line 81-84: The test "ANSI color codes around the summary do not break
parsing" currently uses plain text in the `ansi` variable so it doesn't actually
verify ANSI handling; update the test in `verifier.test.ts` to include real
escape sequences (e.g. use `\x1b` or `\u001b` sequences like "\x1b[31m" /
"\x1b[0m" wrapped around the summary substring) so that Verifier.parseDbtSummary
is exercised with actual ANSI color codes and still returns pass=12; reference
the `Verifier.parseDbtSummary` call and the `ansi` variable when modifying the
test string.
---
Nitpick comments:
In `@packages/opencode/src/router/README.md`:
- Around line 31-35: Update the Integration section to mention that when
Router.enabled() triggers RunCommand (src/cli/cmd/run.ts) the intermediate dbt
build verification step is executed with a hard 300-second timeout; explicitly
state that dbt build runs are limited to 300 seconds to avoid hung builds
stalling the router and that this is a user-facing constraint for large dbt
projects.
In `@packages/opencode/test/router/verifier.e2e.test.ts`:
- Around line 21-31: The project helper currently hand-rolls temp dirs with
mkdtempSync and a module-level dirs array; change it to use the shared tmpdir
fixture with automatic cleanup by converting the helper to accept (or be called
with) the tmpdir() fixture and using "await using tmpdir()" in the test instead
of mkdtempSync, remove the dirs bookkeeping/afterAll cleanup, and create the
dbt_project.yml, profiles.yml, models directory and files inside the provided
fixture path; reference the project(...) helper to accept a fixture path
parameter (or inline its logic in the test using tmpdir()) so all temporary
directories are managed by fixture/fixture.ts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 67cd8f4e-abeb-4a8e-90bb-4057a97ade1e
📒 Files selected for processing (13)
packages/opencode/src/cli/cmd/run.tspackages/opencode/src/router/README.mdpackages/opencode/src/router/policy.tspackages/opencode/src/router/router.tspackages/opencode/src/router/verdict.tspackages/opencode/src/router/verifier.tspackages/opencode/test/router/policy.e2e.test.tspackages/opencode/test/router/policy.test.tspackages/opencode/test/router/router.e2e.test.tspackages/opencode/test/router/router.test.tspackages/opencode/test/router/verdict.test.tspackages/opencode/test/router/verifier.e2e.test.tspackages/opencode/test/router/verifier.test.ts
| const res = await fetch(`${OR}/chat/completions`, { | ||
| method: "POST", | ||
| headers: { "Content-Type": "application/json", Authorization: `Bearer ${KEY}` }, | ||
| body: JSON.stringify({ | ||
| model: apiModel, | ||
| messages: [ | ||
| { role: "system", content: "You are a dbt engineer. Output ONLY the SQL for the requested model in a ```sql code block. No prose, no schema.yml." }, | ||
| { role: "user", content: task + (note ? `\n\nA PREVIOUS ATTEMPT FAILED VERIFICATION:\n${note}` : "") }, | ||
| ], | ||
| max_tokens: 600, | ||
| temperature: 0, | ||
| }), | ||
| }) | ||
| const j: any = await res.json() | ||
| const sql = extractSql(j?.choices?.[0]?.message?.content ?? "select 1 as id") |
There was a problem hiding this comment.
realRunAgent swallows HTTP/API errors and silently writes a fallback model.
There's no res.ok check; on a non-200 (rate limit, auth failure, invalid model) j?.choices?.[0]?.message?.content is undefined and the test silently writes select 1 as id. That can make an escalation test pass for the wrong reason. Surface the failure instead.
🛡️ Fail loudly on a bad response
const j: any = await res.json()
+ if (!res.ok || !j?.choices?.[0]?.message?.content)
+ throw new Error(`OpenRouter call failed (${res.status}): ${JSON.stringify(j).slice(0, 200)}`)
const sql = extractSql(j?.choices?.[0]?.message?.content ?? "select 1 as id")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const res = await fetch(`${OR}/chat/completions`, { | |
| method: "POST", | |
| headers: { "Content-Type": "application/json", Authorization: `Bearer ${KEY}` }, | |
| body: JSON.stringify({ | |
| model: apiModel, | |
| messages: [ | |
| { role: "system", content: "You are a dbt engineer. Output ONLY the SQL for the requested model in a ```sql code block. No prose, no schema.yml." }, | |
| { role: "user", content: task + (note ? `\n\nA PREVIOUS ATTEMPT FAILED VERIFICATION:\n${note}` : "") }, | |
| ], | |
| max_tokens: 600, | |
| temperature: 0, | |
| }), | |
| }) | |
| const j: any = await res.json() | |
| const sql = extractSql(j?.choices?.[0]?.message?.content ?? "select 1 as id") | |
| const res = await fetch(`${OR}/chat/completions`, { | |
| method: "POST", | |
| headers: { "Content-Type": "application/json", Authorization: `Bearer ${KEY}` }, | |
| body: JSON.stringify({ | |
| model: apiModel, | |
| messages: [ | |
| { role: "system", content: "You are a dbt engineer. Output ONLY the SQL for the requested model in a |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/test/router/router.e2e.test.ts` around lines 44 - 58, The
test's fetch call does not check the HTTP response and can silently fall back to
a default SQL via extractSql when the API returns an error; update the POST
response handling in the router.e2e.test.ts fetch block to verify res.ok, and if
not ok read and include the response status and body (text or json) in a thrown
error so the test fails loudly instead of using the fallback "select 1 as id";
ensure this check happens before parsing j and before calling extractSql so
failures from the API (auth, rate limits, invalid model) surface immediately.
| beforeAll(() => { | ||
| if (!KEY) throw new Error("OPENROUTER_API_KEY required for router E2E") | ||
| if (!IMG) throw new Error("E2E_IMG not set — provide a docker image with dbt-duckdb") | ||
| if (Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" }).exitCode !== 0) | ||
| throw new Error(`image ${IMG} missing`) | ||
| }) |
There was a problem hiding this comment.
Same CI-fail pattern: skip instead of throwing when OPENROUTER_API_KEY/E2E_IMG are absent.
Static analysis confirms this suite throws (OPENROUTER_API_KEY required) in the pipeline. Gate the describe so the suite is skipped, not failed, when the opt-in prerequisites aren't present.
🐛 Skip when prerequisites are missing
-beforeAll(() => {
- if (!KEY) throw new Error("OPENROUTER_API_KEY required for router E2E")
- if (!IMG) throw new Error("E2E_IMG not set — provide a docker image with dbt-duckdb")
- if (Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" }).exitCode !== 0)
- throw new Error(`image ${IMG} missing`)
-})
+beforeAll(() => {
+ if (Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" }).exitCode !== 0)
+ throw new Error(`image ${IMG} missing`)
+})And gate the suite (Line 72):
-describe("Router × REAL OpenRouter + REAL dbt (no mocks)", () => {
+const describeE2E = KEY && IMG ? describe : describe.skip
+describeE2E("Router × REAL OpenRouter + REAL dbt (no mocks)", () => {🧰 Tools
🪛 GitHub Check: TypeScript
[failure] 63-63: error: OPENROUTER_API_KEY required for router E2E
at <anonymous> (/home/runner/work/altimate-code/altimate-code/packages/opencode/test/router/router.e2e.test.ts:63:73)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/test/router/router.e2e.test.ts` around lines 62 - 67, The
beforeAll currently throws when env prerequisites (KEY/IMG) or the Docker image
check (Bun.spawnSync) fail, causing CI failures; change the test harness to skip
the entire suite instead of throwing by gating the describe — detect missing
OPENROUTER_API_KEY (KEY) or E2E_IMG (IMG) or missing image via Bun.spawnSync and
call describe.skip (or wrap the describe in a conditional) so the suite is
skipped when prerequisites aren't present; locate the existing
beforeAll/describe in router.e2e.test.ts and update the logic around KEY, IMG
and Bun.spawnSync to skip rather than throw.
| afterAll(() => { | ||
| for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {} | ||
| }) |
There was a problem hiding this comment.
Drop sudo rm -rf here as well.
Same concern as verifier.e2e.test.ts: the privileged delete is redundant with rmSync and a dangerous pattern. Temp dirs from mkdtempSync don't need elevated removal.
🛡️ Remove the privileged delete
-afterAll(() => {
- for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {}
-})
+afterAll(() => {
+ for (const d of dirs) try { rmSync(d, { recursive: true, force: true }) } catch {}
+})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| afterAll(() => { | |
| for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {} | |
| }) | |
| afterAll(() => { | |
| for (const d of dirs) try { rmSync(d, { recursive: true, force: true }) } catch {} | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/test/router/router.e2e.test.ts` around lines 68 - 70, The
cleanup in the afterAll block currently runs a privileged `sudo rm -rf` via
Bun.spawnSync which is unnecessary and dangerous; remove the Bun.spawnSync
invocation and keep the safe Node removal using rmSync(d, { recursive: true,
force: true }) (still wrapped in the existing try/catch). Update the afterAll
cleanup that iterates over dirs so it only calls rmSync for each d (and no
longer calls Bun.spawnSync or any sudo command), referencing the existing
afterAll block, the dirs variable, and rmSync call to locate the change.
| beforeAll(() => { | ||
| if (!IMG) throw new Error("E2E_IMG not set — provide a docker image with dbt-duckdb") | ||
| const ok = Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" }) | ||
| if (ok.exitCode !== 0) throw new Error(`E2E image ${IMG} not present`) | ||
| }) |
There was a problem hiding this comment.
Throwing in beforeAll when E2E_IMG is unset fails CI instead of skipping.
Static analysis confirms this suite is throwing in the pipeline (E2E_IMG not set). An opt-in, infra-dependent suite should skip when its prerequisite env var is missing, not hard-fail the run. Gate the describe with describe.skipIf.
🐛 Skip the suite when the image is not provided
-beforeAll(() => {
- if (!IMG) throw new Error("E2E_IMG not set — provide a docker image with dbt-duckdb")
- const ok = Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" })
- if (ok.exitCode !== 0) throw new Error(`E2E image ${IMG} not present`)
-})
+beforeAll(() => {
+ const ok = Bun.spawnSync(["docker", "image", "inspect", IMG], { stdout: "ignore", stderr: "ignore" })
+ if (ok.exitCode !== 0) throw new Error(`E2E image ${IMG} not present`)
+})And gate the suite (Line 42):
-describe("Verifier × REAL dbt (no mocks)", () => {
+const describeE2E = IMG ? describe : describe.skip
+describeE2E("Verifier × REAL dbt (no mocks)", () => {🧰 Tools
🪛 GitHub Check: TypeScript
[failure] 34-34: error: provide a docker image with dbt-duckdb
provide a docker image with dbt-duckdb
at (/home/runner/work/altimate-code/altimate-code/packages/opencode/test/router/verifier.e2e.test.ts:34:86)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/test/router/verifier.e2e.test.ts` around lines 33 - 37, The
suite currently throws inside beforeAll when the env var IMG (E2E_IMG) is unset,
which fails CI; instead gate the whole suite using the test runner's conditional
skip (e.g., wrap the top-level describe in describe.skipIf(!IMG) or use
describe(…) with conditional describe.skip) so the suite is skipped when
IMG/E2E_IMG is not provided; remove the unconditional throw in beforeAll and
keep the docker image existence check there only when IMG is present (use
Bun.spawnSync as currently done) to fail fast only for explicit runs.
| afterAll(() => { | ||
| for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {} | ||
| }) |
There was a problem hiding this comment.
Drop sudo rm -rf from cleanup.
Bun.spawnSync(["sudo", "rm", "-rf", d]) requires passwordless sudo, is redundant with the following rmSync, and sudo rm -rf is a dangerous footgun if d is ever empty/unexpected. The directories come from mkdtempSync, so the plain rmSync is sufficient.
🛡️ Remove the privileged delete
-afterAll(() => {
- for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {}
-})
+afterAll(() => {
+ for (const d of dirs) try { rmSync(d, { recursive: true, force: true }) } catch {}
+})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| afterAll(() => { | |
| for (const d of dirs) try { Bun.spawnSync(["sudo", "rm", "-rf", d]); rmSync(d, { recursive: true, force: true }) } catch {} | |
| }) | |
| afterAll(() => { | |
| for (const d of dirs) try { rmSync(d, { recursive: true, force: true }) } catch {} | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/test/router/verifier.e2e.test.ts` around lines 38 - 40, The
cleanup block uses Bun.spawnSync(["sudo", "rm", "-rf", d]) which is unnecessary
and dangerous; remove the privileged delete call and rely solely on the existing
rmSync(d, { recursive: true, force: true }) inside the afterAll cleanup (the
function referencing afterAll, dirs, Bun.spawnSync and rmSync), preserving the
try/catch to handle errors and keeping use of mkdtempSync-created dirs as the
source of these temp paths.
| test("ANSI color codes around the summary do not break parsing", () => { | ||
| const ansi = "[0m01:00:00 Done. PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12[0m" | ||
| expect(Verifier.parseDbtSummary(ansi)?.pass).toBe(12) | ||
| }) |
There was a problem hiding this comment.
ANSI test uses no ANSI codes — it doesn't exercise the claim.
ansi is plain text, so this passes trivially regardless of whether the parser handles color codes (see related concern in verifier.ts Line 66-73). Add real escape sequences to make this meaningful.
💚 Inject real ANSI sequences
test("ANSI color codes around the summary do not break parsing", () => {
- const ansi = "01:00:00 Done. PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12"
+ const ansi = "01:00:00 \x1b[32mDone.\x1b[0m PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12"
expect(Verifier.parseDbtSummary(ansi)?.pass).toBe(12)
})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| test("ANSI color codes around the summary do not break parsing", () => { | |
| const ansi = "[0m01:00:00 Done. PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12[0m" | |
| expect(Verifier.parseDbtSummary(ansi)?.pass).toBe(12) | |
| }) | |
| test("ANSI color codes around the summary do not break parsing", () => { | |
| const ansi = "01:00:00 \x1b[32mDone.\x1b[0m PASS=12 WARN=0 ERROR=0 SKIP=0 TOTAL=12" | |
| expect(Verifier.parseDbtSummary(ansi)?.pass).toBe(12) | |
| }) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/opencode/test/router/verifier.test.ts` around lines 81 - 84, The
test "ANSI color codes around the summary do not break parsing" currently uses
plain text in the `ansi` variable so it doesn't actually verify ANSI handling;
update the test in `verifier.test.ts` to include real escape sequences (e.g. use
`\x1b` or `\u001b` sequences like "\x1b[31m" / "\x1b[0m" wrapped around the
summary substring) so that Verifier.parseDbtSummary is exercised with actual
ANSI color codes and still returns pass=12; reference the
`Verifier.parseDbtSummary` call and the `ansi` variable when modifying the test
string.
What does this PR do?
Adds a verifier-gated model router: run a cheap model first, verify the result deterministically (
dbt build), and escalate to a stronger model only when verification fails. Flag-gated viaALTIMATE_ROUTER(default off) — the normal single-model run path is byte-for-byte unchanged.packages/opencode/src/router/verifier.ts— deterministicVerdictfromdbt build/dbt test. Spoof-proof (parses the last summary + usesexitCodeas backstop, so a model-emitted fakeDone. PASS=… ERROR=0cannot pass the gate). PluggableImpl; honestunverifiablestate when a gate can't run.packages/opencode/src/router/router.ts— the escalation ladder. Runs each tier, verifies, escalates on a failed verdict with the exact failing checks as context, stops at the first pass. Per-tier exception handling (a transient tier error escalates instead of aborting). Ladder overridable viaALTIMATE_ROUTER_LADDER.packages/opencode/src/router/policy.ts— where the ladder comes from: a static default, or a per-context policy fetched from the altimate API whenALTIMATE_API_KEYis set (withAbortSignaltimeouts +sanitizeTiersvalidation/capping; degrades to static on any failure).packages/opencode/src/router/verdict.ts— a machine-checkable verdict envelope (schemaVersion, per-attempt history, checks, evidence hash, optional signature).packages/opencode/src/cli/cmd/run.ts— orchestration when the flag is on. Only routes verifiable (dbt) workspaces; a non-dbt project runs once with the user's own model (no silent downgrade).dbt buildruns with a hard timeout.The customer routing policy endpoint lives in
altimate-backend(separate PR). This PR ships the client + the static default.Type of change
Issue for this PR
Closes #853
How did you verify your code works?
test/router/*.test.ts), including adversarial cases: dbt summary-line injection, ANSI/huge/multi-summary output, endpoint response validation + cost-bomb capping, and per-tier exception escalation.test/router/*.e2e.test.ts) run with real dependencies (no mocks): realdbt buildin a container (incl. a spoof-attack that fails to fool the gate), real OpenRouter model calls with real escalation, and real-network policy fallback + adversarial endpoint responses.tsgo --noEmittypecheck clean on all changed files; marker check clean.ALTIMATE_ROUTER=1on a real dbt project): routed to the cheap tier → real agent run → realdbt buildverify → verdict emitted, no escalation needed./consensus:code-review); all CRITICAL/MAJOR findings (non-dbt gating, external-call timeouts, per-tier exception handling, honestunverifiableverdict, endpoint hardening) applied before opening.Checklist
execute()-level follow-ups (reuse one session across tiers;finallycleanup of per-tier listeners/tracer) tracked for a follow-up before marking readySummary by cubic
Adds a verifier‑gated model router that runs a cheap model first, verifies with
dbt build, and escalates only on failure. Flag-gated viaALTIMATE_ROUTER(default off); non‑dbt projects keep the existing single‑model flow.New Features
exitCode; reports failing nodes; returnsunverifiablewhen the gate can’t run.ALTIMATE_API_KEYis set; validated and capped; falls back to static on any failure.dbt buildhas a 300s hard timeout.ALTIMATE_ROUTER,ALTIMATE_ROUTER_LADDER,ALTIMATE_API_KEY,ALTIMATE_API_URL.Migration
ALTIMATE_ROUTER=1.ALTIMATE_API_KEYto use the Altimate policy and outcome reporting.ALTIMATE_ROUTER_LADDER(comma-separatedprovider/model).Written for commit fd63be9. Summary will update on new commits.
Summary by CodeRabbit
New Features
runcommand enables multi-tier policy routing with workspace verification.Documentation
Tests