Skip to content

feat(inspect): add IR-based pipeline reasoning tools (inspect, graph, whatif, lint, catalog, trace, mcp-author)#998

Merged
jamesadevine merged 17 commits into
mainfrom
feat/agent-ir-tools
Jun 15, 2026
Merged

feat(inspect): add IR-based pipeline reasoning tools (inspect, graph, whatif, lint, catalog, trace, mcp-author)#998
jamesadevine merged 17 commits into
mainfrom
feat/agent-ir-tools

Conversation

@jamesadevine

@jamesadevine jamesadevine commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds an agent-facing read-only surface over the typed pipeline IR so agents (in IDE Copilot Chat, sub-agents, or downstream tooling) can reason about pipeline structure and failures programmatically instead of grep-walking compiled .lock.yml files.

Layered design — each layer reusable by the next:

Layer Surface
0 — JSON schema PipelineSummary (schema_version: 1) + GraphSummary etc. in src/compile/ir/summary.rs; Serialize+Deserialize, stable contract.
1a — Public IR entry compile::build_pipeline_ir(path) -> (FrontMatter, Pipeline) — read-only, no YAML written.
1b — CLI commands inspect, graph {dump,deps,outputs}, whatif, lint, catalog, trace.
2 — MCP server New ado-aw mcp-author stdio server exposing every L1 command as a read-only MCP tool for IDE / Copilot Chat.
3 — Audit integration AuditData.pipeline_graph, JobData.{upstream,downstream}_jobs, new "downstream-skipped" finding.

Follow-up fixes from review feedback

  • Fixed whatif condition classification to correctly handle negated predicates (e.g. not(failed()), not(always())) and added targeted regression tests.
  • Reduced audit/trace drift risk by centralizing duplicated job result logic into shared JobData helpers.
  • Updated mcp-author audit_build to write under a temp directory ($TMP/ado-aw/audit) instead of ./logs.
  • Clarified schema stability docs: additive optional fields are non-breaking, but new enum variants currently require a schema-version bump with the present enum encoding.
  • Renamed an internal graph traversal variable in whatif for readability (reversedownstream).

What agents can now do

  • Debug: whatif --fail Detection answers "what skips" without manual graph walking; graph deps SafeOutputs --direction upstream returns the typed transitive dependency set; trace <build> joins ADO timeline with the IR so a failed timeline node surfaces its derived skip-set.
  • Author: lint flags unused outputs / anonymous producers; catalog --kind safe-outputs lists every available tool (24 found); inspect --json returns a stable schema-versioned summary for scripting.
  • IDE: All of the above exposed over MCP via ado-aw mcp-author for VS Code / Claude Desktop.

Trust-boundary preserved

The new mcp-author server is separate from the existing SafeOutputs MCP that ships inside compiled pipelines. Stage 1 sandboxed agents do NOT get these tools — they're author/debug-only surface, intentional.

Docs

  • New docs/mcp-author.md (tool surface + VS Code / Claude Desktop config snippets).
  • New "Public JSON summary (ir::summary)" section in docs/ir.md.
  • New commands in docs/cli.md; pipeline_graph + upstream_jobs / downstream_jobs in docs/audit.md.
  • prompts/debug-ado-agentic-workflow.md and prompts/update-ado-agentic-workflow.md updated to surface the new commands.
  • Follow-up clarification to schema stability wording in docs/ir.md.

Test plan

  • cargo build
  • cargo test --bin ado-aw ✅ (full binary test suite passing after follow-up fixes)
  • cargo test --test inspect_integration4 passed (end-to-end CLI tests on the create-pull-request.md fixture)
  • Targeted regression coverage:
    • cargo test --bin ado-aw inspect::whatif::tests::
    • cargo test --bin ado-aw inspect::trace::tests::
    • cargo test --bin ado-aw audit::findings::tests::
    • cargo test --bin ado-aw mcp_author::tests::
  • cargo clippy --all-targets --all-features0 warnings
  • Manual smoke against tests/safe-outputs/create-pull-request.md:
    • ado-aw inspect ... → text + --json (schema_version: 1)
    • ado-aw graph dump ... --format dot → valid DOT digraph
    • ado-aw graph deps ... SafeOutputs --direction upstream → upstream jobs Agent, Detection
    • ado-aw whatif ... --fail AgentDetection, SafeOutputs skipped
    • ado-aw lint ... → 0 errors
    • ado-aw catalog --kind safe-outputs --json → 24 entries
    • ado-aw mcp-author --help → wired

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — layered design is clean, trust boundary is maintained, error handling is solid. A few correctness and maintenance issues worth addressing.


Findings

🐛 Bugs / Logic Issues

  • [src/inspect/whatif.rs:classify_condition] The substring check normalized.contains("failed()") yields false positives for not(failed()) (which means "only run when NOT failed" — same semantics as the default succeeded(), so the job would be skipped after an upstream failure). whatif would incorrectly report such a job as runs_anyway. Same problem for not(always()). The fix is to match against the full lowered string or, better, classify against the typed Condition AST before lowering rather than the rendered string.

    // Today — wrong for not(failed()):
    if normalized.contains("always()")
        || normalized.contains("failed()")  // ← also matches not(failed())
  • [src/audit/findings.rs] + [src/inspect/trace.rs] job_failed is copy-pasted verbatim into both files. job_classification (findings.rs) and job_status (trace.rs) are also identical in body. If either drifts (e.g., adding handling for the British-spelling "cancelled"), the other goes stale silently. Extract to a shared helper on JobData or in audit/model.rs.

⚠️ Suggestions

  • [src/mcp_author/mod.rs:audit_build] The audit_build MCP tool hardcodes output: Path::new("./logs"). This writes artifact files relative to whatever CWD the MCP server process has, with no way for the caller to override it. Since mcp-author runs in arbitrary directories, this can scatter cache files unexpectedly. Consider std::env::temp_dir(), a platform cache dir, or exposing an output_dir parameter.

  • [src/inspect/whatif.rs:reachable_edges] The local variable is named reverse but actually holds a forward (producer→consumers) adjacency map. The traversal logic is correct, but the name will trip up future readers. Rename to downstream or forward.

  • [src/compile/ir/summary.rs] The stability-contract doc states: "new enum variants in unknown-tolerant contexts do not require a bump." However StepKind, PoolSummary, and PipelineBodySummary have no #[serde(other)] catch-all or Unknown variant, so any new variant is a silent Deserialize failure for old consumers. Either add #[serde(other)] / Unknown variants to honour the contract, or tighten the doc to say new variants always require a schema-version bump.

✅ What Looks Good

  • Error handling is thorough — anyhow::Result everywhere, no silent unwrap()s on user-facing paths.
  • Trust boundary correctly enforced: mcp-author is fully separate from the in-pipeline SafeOutputs MCP; Stage 1 sandboxed agents do not get these tools.
  • build_pipeline_ir correctly uses skip_integrity: true, consistent with check_pipeline semantics.
  • WhatIfError with typed errors and Levenshtein-based suggestion is a nice DX touch.
  • 21 new unit tests including the cross-fixture round-trip against create-pull-request.md.

Generated by Rust PR Reviewer for issue #998 · sonnet46 4.6M ·

@jamesadevine

Copy link
Copy Markdown
Collaborator Author

@copilot address feedback and suggestions

Copilot AI commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

@copilot address feedback and suggestions

Addressed the feedback in commit c4063bd (plus prior follow-up f5c2c65): fixed whatif negated-condition handling with added tests, deduplicated job status/failure logic via JobData helpers, moved mcp-author audit_build output to a temp dir, clarified IR schema-version docs, and applied the maintainability nits.

@jamesadevine

Copy link
Copy Markdown
Collaborator Author

/rust-review

@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Rust PR Reviewer failed. Please review the logs for details.

@jamesadevine

Copy link
Copy Markdown
Collaborator Author

/rust-review

@github-actions

github-actions Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Rust PR Reviewer completed successfully!

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Mostly solid — the layered IR inspection design is clean and well-tested, but there are two security issues and one logic bug worth addressing before merge.


Findings

🔒 Security Concerns

  • src/mcp_author/mod.rs (lines 109–113, 147–150) — PAT leaked via Debug
    TraceFailureParams and AuditBuildParams both #[derive(Debug)] while carrying pat: Option<String>. Any {:?} log or dbg!() call on these structs (including error-path traces in the rmcp framework) prints the PAT in plaintext. Use a newtype wrapper with a redacted Debug impl, or manually impl Debug to mask the pat field.

  • src/audit/pipeline_graph.rs (resolve_source_path, line ~145) — arbitrary file read via crafted build artifact
    The source value is read from aw_info.json inside the downloaded artifact. If path.is_absolute() the function returns it unconditionally — there is no canonicalization or bounds check. Via the audit_build MCP tool an IDE agent that's been prompt-injected by a malicious repo could supply a build whose artifact carries "source": "/home/user/.ssh/id_rsa" (or any other readable file). build_pipeline_ir will fail to parse it as markdown and surface the error, but the file is still opened and read. A canonicalize + prefix check against std::env::current_dir() (or a user-supplied root) would close this.

🐛 Bugs / Logic Issues

  • src/audit/findings.rs (job_name_matches) vs src/inspect/trace.rs (find_runtime_job) — duplicated matching logic
    Both functions implement the same name == id || rsplit('.').next() == id heuristic independently. The PR description says it "centralized duplicated job result logic into shared JobData helpers" but this particular matcher wasn't merged. A future typo fix in one copy won't reach the other. Extract to a shared crate::audit::pipeline_graph::job_name_matches (the module already has timeline_name_matches_job — one of these could absorb the other).

⚠️ Suggestions

  • src/inspect/lint.rs (rule_step_id_collisions) — collision reports point to second duplicate, not first
    The BTreeMap::insert sentinel means the first occurrence is silently accepted and only subsequent ones emit findings. In a two-step collision the output says "step id 'X' used more than once" but identifies the consumer job, not the producer. Swap to BTreeSet<String> to detect on first repeat; store the first-seen &JobSummary in a separate map to include the original location in the message. (Note: the compiler's graph pass already rejects this at compile time, so this is a "belt-and-suspenders" lint — low urgency.)

  • src/inspect/whatif.rs (classify_condition) — undocumented coverage gap
    The classifier only recognizes always(), failed(), succeededOrFailed(). ADO also accepts variable-based conditions such as eq(variables['Agent.JobStatus'], 'Failed'). The safe fallback to Skipped is correct, but users may be surprised when whatif classifies a variable-guarded job as "skipped". A comment (or a lint rule) documenting this limitation would help.

  • src/audit/findings.rs (add_downstream_impact_findings) — _recommendations is unused
    The _recommendations parameter is required by the function signature convention but there's an obvious recommendation to add: "Check the {job} job logs for root cause." Even a one-liner here would make the finding more actionable in the console report.

✅ What Looks Good

  • The is_negated_call even/odd NOT-nesting logic correctly handles double negation (not(not(failed()))RunsAnyway), which was the regression the PR explicitly targeted.
  • Graceful degradation when aw_info.json is absent or the source file isn't locally available is exactly right for an audit-time utility.
  • McpAuthor is correctly excluded from the update-check and pipeline-internal guards.
  • Test coverage is solid — targeted whatif::tests, trace::tests, and findings::tests all cover the logic branches that were hardest to get right.

Generated by Rust PR Reviewer for issue #998 · 247.8 AIC · ⌖ 12.6 AIC · ⊞ 32.5K ·
Comment /rust-review to run again

jamesadevine added a commit that referenced this pull request Jun 14, 2026
…dback

Security:

- mcp_author: manual Debug impls for TraceFailureParams and AuditBuildParams so the optional pat field is no longer leaked via {:?} / dbg!() / rmcp error traces.

- audit::pipeline_graph::resolve_source_path: require .md extension and reject parent-dir / tilde components to close the /home/user/.ssh/id_rsa exfiltration vector flagged by the reviewer. populate_pipeline_graph downgrades a rejection to a warning so legitimate audits still complete.

Maintainability:

- audit::model::JobData: add matches_ir_id helper. findings::add_downstream_impact_findings and inspect::trace::find_runtime_job now both delegate to it instead of carrying duplicate Stage.Job suffix-match heuristics.

- audit::findings::add_downstream_impact_findings: emit a Recommendation pointing operators at the failed-job logs (the _recommendations parameter is no longer unused).

- inspect::lint::rule_step_id_collisions: collision message now names BOTH the first-seen producer location and the colliding consumer (was only pointing at the second occurrence).

- inspect::whatif::classify_condition: doc-comment now records the coverage gap for variable-based conditions (e.g. Agent.JobStatus comparisons) which we deliberately classify as Skipped; new test pins that behaviour.

New tests:

- audit::pipeline_graph: 4 resolver tests covering the rejected attack patterns and the accepted legitimate absolute-path case.

- audit::pipeline_graph::populate_pipeline_graph_records_warning_on_malicious_source — end-to-end test that a malicious source string is recorded as a warning rather than read from disk.

- inspect::whatif::variable_based_condition_is_conservatively_skipped — pins the documented limitation.

Validated locally with cargo build, cargo test (1880 unit tests + integration suites pass), and cargo clippy --all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid layered design with good test coverage — one substantive logic bug in the condition classifier, a security suggestion for the MCP source path, and a lint rule that's effectively dead code.


Findings

🐛 Bugs / Logic Issues

src/inspect/whatif.rsnot(succeededOrFailed()) misclassified as RunsAnyway

"failed()" is a substring of "succeededorfailed()" (it starts at offset 11). The is_negated_call check looks for "not(" at idx - 4, but when "failed()" is matched inside "succeededorfailed()", the four characters before it are "edor", not "not(". So not(succeededOrFailed()) — which should be Skipped — is incorrectly classified as RunsAnyway.

Minimal repro that would fail:

detection.condition = Some("not(succeededOrFailed())".to_string());
// classify_condition returns RunsAnyway, should return Skipped

Fix: add a word-boundary guard before accepting a substring match, e.g. verify that the character immediately before the match is (, a comma, or start-of-string and not a lowercase letter:

fn is_word_boundary_before(s: &str, idx: usize) -> bool {
    idx == 0 || s.as_bytes().get(idx - 1).map_or(true, |&b| !b.is_ascii_alphanumeric())
}

Then in contains_unnegated_call, skip any hit where !is_word_boundary_before(normalized_condition, idx).

The PR already tests not(failed()), not(always()), and not(not(failed())), so a test for not(succeededOrFailed())Skipped should be added alongside the fix.


🔒 Security Concerns

src/mcp_author/mod.rssource_path() accepts arbitrary file paths

fn source_path(path: &str) -> PathBuf {
    PathBuf::from(path)  // no validation
}

All nine MCP tools that accept source_path reach build_pipeline_ir (which reads the file), and none validate the path first. A crafted MCP request could supply ../../other-project/secrets.md and the IR summary — including env-var names, step names, and any secrets embedded in bash step bodies — would be returned in the tool result.

audit/pipeline_graph.rs's resolve_source_path already models a minimal policy (.md extension, no .., no leading ~). Applying the same guard in source_path() would be consistent and adds defense-in-depth for IDE integrations that proxy LLM-generated requests.


⚠️ Suggestions

src/inspect/lint.rsrule_missing_is_output is effectively unreachable

// TODO: This should remain quiet while PipelineSummary patches
// auto_is_output from the graph.
if !decl.auto_is_output {
    findings.push(...)  // never fires
}

PipelineSummary::from_pipeline (in summary.rs) already patches every affected output's auto_is_output = true before lint() is ever called. The guard means this rule will never emit a finding in normal usage. The TODO is noted but it leaves rule_missing_is_output as silent dead code. Consider either removing the body and leaving a doc comment explaining the invariant, or making the rule unconditional (relying on the summary having been built correctly) so it can catch future drift.

src/inspect/mod.rs — stale module doc comment

The doc comment lists trace.rs, whatif.rs, lint.rs, and catalog.rs under "Future siblings (not yet landed)" — but all four are present in this PR. The comment should be updated to reflect what's actually in the module.


✅ What Looks Good

  • Security in resolve_source_path: the .md-extension requirement + ..-component rejection + ~-prefix guard (with inline threat-model commentary) is exactly the right approach for an untrusted audit artifact field.
  • PAT redaction in Debug impls: masking pat in TraceFailureParams and AuditBuildParams prevents accidental token exposure in log lines or MCP error traces.
  • Condition double-negation handling: the not(not(failed()))RunsAnyway test (and the toggle logic in is_negated_call) is handled correctly and regression-tested.
  • JobData centralization: pulling failed() and matches_ir_id() into audit::model::JobData instead of duplicating across findings and trace is a good DRY call.
  • Graceful degradation in populate_pipeline_graph: recording a warning rather than erroring when source metadata isn't locally available is the right UX for auditing cross-machine builds.

Generated by Rust PR Reviewer for issue #998 · 642.6 AIC · ⌖ 13 AIC · ⊞ 32.9K ·

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid layered design with good security discipline in most places; a few logic issues worth fixing before merge.


Findings

🐛 Bugs / Logic Issues

src/audit/pipeline_graph.rs:timeline_name_matches_job — overly-broad suffix fallback

// Branch 3 (the fallback):
timeline_name
    .rsplit('.')
    .next()
    .is_some_and(|suffix| suffix == job_id)

rsplit takes the last dot-separated token, so a timeline name like Stage1.SubStage.Agent would match job_id = "Agent". Branch 2 (format!("{stage}.{job_id}")) already handles the canonical Stage.Job form. The fallback is effectively dead code for all known pipeline shapes but could attach wrong IR edges to timeline entries in unusual configurations. Consider replacing branch 3 with strict Stage.Job parsing or removing it entirely:

// More precise: only accept "anything.job_id" (single-level prefix)
if let Some((_, suffix)) = timeline_name.rsplit_once('.') {
    suffix == job_id
} else {
    false
}

src/audit/cli.rs:fetch_audit_data_innerderive_findings called in cache path but not before save

When the result comes from a fresh download, findings::derive_findings is called (correct). When the result comes from cache, derive_findings is also called after populate_pipeline_graph. But the saved run-summary.json snapshot was written before this run's pipeline-graph pass, so the downstream-skipped finding and the new upstream_jobs/downstream_jobs job fields are never persisted. On the next cached load they're recomputed in memory but not saved back. That means audit --json from cache could silently produce a different AuditData shape than the canonical saved summary. This is unlikely to cause a hard failure but may confuse tooling that compares successive outputs.


🔒 Security Concerns

src/mcp_author/mod.rs:source_path — no path sanitization for an externally-routed MCP server

fn source_path(path: &str) -> PathBuf {
    PathBuf::from(path)
}

audit/pipeline_graph.rs:resolve_source_path validates the source value from untrusted build artifacts with .. rejection and a mandatory .md extension guard. The same value arriving over the mcp-author stdio channel gets no equivalent treatment. The module comment notes "no workspace bounding directory: callers run it locally as the invoking user" — that's an intentional design choice, but it means a compromised or misconfigured IDE extension (or a prompt-injected model call) could feed arbitrary paths to build_pipeline_ir, which calls tokio::fs::read_to_string on the raw path before any parsing. Because build_pipeline_ir then sanitizes and validates front matter, the content leak is bounded to whatever can be read (parsing will fail for non-markdown files), but the file is read. Given that the SafeOutputs MCP server has thorough input sanitization, applying at minimum a .. / tilde guard and an .md extension check here would be consistent with project conventions and cheap to add.


⚠️ Suggestions

src/mcp_author/mod.rs:audit_build — shared temp directory for concurrent calls

let output = std::env::temp_dir().join("ado-aw").join("audit");

All audit_build invocations write artifacts under the same base directory. Two concurrent calls for different builds are safe (cache subdirectories are keyed by build ID). Two concurrent calls for the same build with no_cache: true will both write to the same path, with the second write potentially overwriting partially-written artifacts from the first. Including the build ID in the output path — or using a per-invocation tmp directory for the no_cache: true case — would close this race.


src/compile/mod.rs:build_pipeline_irskip_integrity = true for all targets without a caller note

The function hardcodes skip_integrity = true for every compile target. That's correct for a read-only IR view, but the parameter name at each call site carries no documentation (positional true, false in a 8-arg function call). A named parameter, a brief comment, or at minimum a // skip_integrity inline comment at each call would prevent a future caller from cargo-culting a copy with the wrong value. This is purely ergonomic.


✅ What Looks Good

  • PAT redaction in TraceFailureParams and AuditBuildParams: manual Debug impls that emit <redacted> are exactly right — framework traces and logs never see the token.
  • resolve_source_path security analysis: the inline doc explicitly calls out the threat model, the mitigations (.md extension, .. rejection, ~ rejection), and the residual risk. This is well-reasoned and the code matches the doc.
  • is_negated_call + word-boundary guard: the combination correctly handles not(failed()) as Skipped, not(not(failed())) as RunsAnyway, and avoids the false match of failed() inside succeededorfailed(). The coverage-limitations comment is honest about what the static classifier can't see.
  • Job(redacted) / classification()/matches_ir_id()`: centralising these helpers avoids the dual-definition drift that prompted the follow-up fix mentioned in the PR description.
  • Trust boundary: mcp-author and the SafeOutputs MCP server share zero write-capable code paths. Stage 1 agents get no IR-query tools. This is exactly the right separation.
  • Schema stability contract (SCHEMA_VERSION, additive-vs-breaking guidance, enum-variant caveat): clear and actionable for downstream tooling authors.

Generated by Rust PR Reviewer for issue #998 · 441.3 AIC · ⌖ 12.3 AIC · ⊞ 32.9K ·

jamesadevine added a commit that referenced this pull request Jun 14, 2026
…dback

Bugs:
- audit::pipeline_graph::timeline_name_matches_job and
  audit::model::JobData::matches_ir_id: tighten the dotted-name suffix
  fallback so multi-level names like `Stage1.SubStage.Agent` no longer
  spuriously match against the bare id `Agent`. The fallback now only
  accepts a single-level `Stage.Job` qualifier (rsplit_once + prefix
  must contain no dot). New regression tests pin the behaviour.
- audit::cli::fetch_audit_data_inner: extract the post-fetch enrichment
  (pipeline_graph + metrics + derive_findings) into a single helper so
  the fresh-download and cache-load paths both produce structurally
  identical AuditData. The cache-load path now persists the recomputed
  data back to run-summary.json when it changed, so subsequent runs see
  the canonical shape and tooling diffing successive outputs is no
  longer confused by drift between the saved file and in-memory result.

Suggestions:
- mcp_author::audit_build: route `no_cache: true` invocations through a
  per-invocation `tempfile::tempdir()` so two concurrent calls for the
  same build can no longer race on partially-written artifacts in the
  shared `${TEMP}/ado-aw/audit/build-<id>` directory. Cached calls
  continue to use the shared cache root so warm-cache reads still
  benefit from cross-invocation reuse.
- compile::build_pipeline_ir: add `/* skip_integrity */` and
  `/* debug_pipeline */` inline-name comments to every dispatched
  target-builder call (only Standalone had them) so future callers
  cannot cargo-cult the wrong positional bool.

Validated locally: cargo build, cargo test (1884 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — well-structured layered design with solid security defaults. A few items worth addressing before merge.

Findings

⚠️ Suggestions

  • src/inspect/whatif.rsclassify_condition false positives on string literals
    The condition classifier normalizes and text-searches the lowered ADO condition string. A condition like eq(variables['Agent.JobStatus'], 'Failed') (as already tested in the regression suite) does contain the token failed but failed() (with the ()) won't match, so that specific case is handled correctly. However, a condition string like eq(variables['result'], 'failed()') — i.e., the literal string "failed()" in a comparison — would produce a false RunsAnyway classification. Since the PR documents this as a "static classification" best-effort tool and the conditions are compiler-generated rather than raw user input, this is acceptable, but it's worth a short doc comment on classify_condition noting the limitation.

  • src/audit/cli.rs — cache write race on concurrent audit runs
    The new derive_post_processing path re-persists the updated AuditData back to the cache when post-processing changes it (to avoid drift between the in-memory result and the stored file). Two concurrent ado-aw audit invocations for the same build ID will race on save_run_summary. The failure path is already handled gracefully (downgraded to a warning), so this is low severity, but a quick advisory comment at the site would prevent future confusion.

  • src/audit/pipeline_graph.rs — absolute paths pass resolve_source_path with .md extension
    The security comment above the function is truncated in the diff ("...could carry "source": "/home/user/.ssh/id). The effective guard is the .md extension check: a crafted aw_info.json with "source": "/home/user/something.md" reaches build_pipeline_ir. That function is read-only and fails gracefully on invalid markdown, so the practical blast radius is an unexpected parse error, not code execution. Still, it's worth completing the comment to document the residual risk so future maintainers don't weaken the extension check without understanding it's the primary gate for absolute paths.

  • src/inspect/cli.rsdispatch_catalog returns anyhow::Result but is called with ? in a main that exits 0 on success
    This is fine as-is, but the lint command uses std::process::exit(1) to signal lint failures — a pattern that bypasses any Drop implementations. Since build_pipeline_ir uses tokio async I/O, any open resources are runtime-managed and won't leak, but it's worth a comment that exit(1) is intentional (it mirrors how tsc --noEmit / eslint signal failure to CI).

✅ What Looks Good

  • Path-traversal defense in resolve_source_path: Rejecting .. components, ~ prefix, and anything without .md extension from build-artifact aw_info.json is exactly right. The source path is untrusted data from a downloaded artifact.
  • Custom Debug impls for PAT fields: TraceFailureParams and AuditBuildParams both have hand-written Debug that emits <redacted> for the pat field — the right call since rmcp uses {:?} in error traces. (The Commands enum uses #[derive(Debug)] too, but that's a pre-existing pattern shared with Audit, Run, Enable, etc.)
  • is_negated_call correctness: Double-negation (not(not(failed()))) correctly yields RunsAnyway, and the word-boundary check prevents failed() from matching inside succeededorfailed(). The regression tests cover all four cases.
  • reachable_edges BFS: The seen set correctly breaks cycles. Correct.
  • unwrap()/expect() usage: All instances are inside #[cfg(test)] modules — none on production paths.
  • Trust boundary: mcp-author stdio server is intentionally separate from the SafeOutputs MCP used by sandboxed Stage 1 agents. The PR description calls this out explicitly, and the implementation honours it.
  • to_mcp_error full error chain: Exposing {error:#} to the MCP client is fine here — the IDE/Copilot Chat consumer is the same user who owns the local process. Would be a concern in a multi-tenant context but not here.

Generated by Rust PR Reviewer for issue #998 · 687.1 AIC · ⌖ 13.2 AIC · ⊞ 32.9K ·

jamesadevine added a commit that referenced this pull request Jun 14, 2026
…998 round 4

All four items in the latest review are documentation-only — the
runtime behaviour is already correct and the reviewer asked for
explicit comments so future maintainers do not weaken the guards or
re-introduce the fixed bugs by accident.

- inspect::whatif::classify_condition: doc-comment now explicitly
  flags the string-literal false-positive (e.g. a condition like
  eq(variables['result'], 'failed()') would match the literal
  substring) alongside the existing variable-condition limitation.
  This is acceptable because ADO conditions are compiler-generated,
  but the residual gap is now documented.

- audit::cli::fetch_audit_data_inner cache write path: add an inline
  comment recording the deliberate lack of filesystem locking on
  save_run_summary. Two concurrent ado-aw audit runs for the same
  build can race on this write; the failure is recorded as a warning
  rather than aborting the audit, and both writers derive from the
  same on-disk artifacts so the resulting summary stays internally
  consistent.

- audit::pipeline_graph::resolve_source_path: complete the previously
  truncated security comment so the residual risk is unambiguous —
  the .md extension check is the primary gate for absolute paths, and
  weakening or removing it without adding a containment check would
  silently re-open the arbitrary-file-read vector flagged earlier.

- main: comment on the std::process::exit(1) in the Lint dispatch arm
  explaining the intentional Drop-bypass (mirrors tsc --noEmit /
  eslint) and that the only resources in scope are runtime-managed
  async I/O.

No behavioural changes. Validated locally with cargo build, cargo
test --bin ado-aw (1884 unit tests pass), and cargo clippy
--all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Solid layered design — the IR summary shape is clean, trust boundary is well-maintained, PAT redaction is correct, and the path-validation work in pipeline_graph.rs is thorough. Two real issues worth addressing before merge.


Findings

🐛 Bugs / Logic Issues

src/inspect/cli.rs:build_trace — hardcodes Path::new("./logs")

// src/inspect/cli.rs ~line 205
let audit = crate::audit::fetch_audit_data(crate::audit::AuditOptions {
    output: Path::new("./logs"),   // ← still hardcoded
    ...
})

The audit_build MCP tool was correctly fixed to use $TMP/ado-aw/audit, but build_trace was not. The MCP trace_failure tool calls inspect::build_trace directly:

// mcp_author/mod.rs
let (_audit, report) = inspect::build_trace(&opts).await.map_err(to_mcp_error)?;

This means every trace_failure call from an IDE silently creates ./logs/ in the server's working directory — the same unwanted side-effect that was fixed in audit_build. TraceOptions also lacks an output field, so callers have no way to override it.

Simplest fix: default to the same std::env::temp_dir().join("ado-aw").join("audit") root inside build_trace, or add an output: Option<&'a Path> field to TraceOptions.


src/audit/findings.rs:add_downstream_impact_findings — misleading "skipped" title

title: format!("Downstream jobs skipped due to {} failure", job.name),

job.downstream_jobs contains all IR-derived graph successors, not just jobs that were actually skipped at runtime. When a downstream job has always() (e.g. a cleanup job), it runs and its classification() returns "succeeded" — but the finding title still says "skipped". A finding reading "Downstream jobs skipped due to Agent failure: CleanupJob: succeeded" is contradictory.

The description body is fine (it shows the real runtime classification). The title should be more neutral, e.g. "Downstream jobs impacted by {} failure", or the finding should be suppressed for downstream jobs whose result is not "skipped"/"cancelled".


⚠️ Suggestions

src/inspect/graph_query.rs:render_dot — visual direction vs. execution order

out.push_str("  rankdir=LR;\n");  // left-to-right
// ...
out.push_str(&format!("  \"{}\" -> \"{}\";\n", cons, prod)); // consumer → producer

The combination of rankdir=LR and consumer→producer edges means the DOT graph lays out SafeOutputs → Detection → Agent from left to right — execution order runs right-to-left. Most users piping to dot -Tsvg expect to see Agent first on the left. Consider using rankdir=RL (which would put Agent on the left), or flipping to producer→consumer edges with rankdir=LR. The inline comment documents the current convention, which is enough for awareness, but the rendered output surprises most viewers.


✅ What Looks Good

  • PAT redaction via manual Debug impls for TraceFailureParams and AuditBuildParams is correct and tested.
  • Path validation in resolve_source_path (audit/pipeline_graph.rs): the .md-extension gate, ParentDir rejection, and tilde check are all present, documented, and covered by targeted tests — including the malicious_source test that verifies the warning path.
  • is_negated_call fix: the updated contains_unnegated_call with word-boundary guard correctly handles not(failed()), not(not(failed())), and the failed() inside succeededorfailed() substring case. All regression tests pass.
  • build_pipeline_ir public API: codemods in-memory only, no disk writes — correct parity with check_pipeline.
  • **Job(redacted) centralization**: single-level Stage.Jobguard is correct and tested; the oldrsplit('.').next()` multi-level bug is explicitly covered.

Generated by Rust PR Reviewer for issue #998 · 501.1 AIC · ⌖ 12.8 AIC · ⊞ 32.9K ·

@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Well-designed layered IR tooling with good test coverage — two issues worth addressing before merge: an unvalidated path in the new MCP server and a very noisy lint rule.

Findings

🔒 Security Concerns

  • src/mcp_author/mod.rs:377-379source_path() accepts arbitrary filesystem paths from MCP clients without validation

    fn source_path(path: &str) -> PathBuf {
        PathBuf::from(path)  // no checks at all
    }

    Every inspect_*, graph_*, whatif, and lint_workflow tool passes the caller-supplied source_path string directly to build_pipeline_ir(), which opens and reads the file. Compare this to audit/pipeline_graph.rs, which carefully validates the same source field from untrusted aw_info.json (.md extension check, .. component check, ~ prefix check). The mcp-author server is intentionally local-only, but it runs as an MCP server available to any connected AI agent — a prompt-injected agent could instruct an IDE session to call inspect_workflow with source_path = "/home/user/notes.md" or source_path = "../../secrets.md".

    The fix is straightforward: apply the same .md-extension guard and ..-component rejection that resolve_source_path() already implements. The doc-comment in pipeline_graph.rs explicitly says "Do not weaken or remove the .md extension check" — the MCP server should honour the same invariant. An absolute-path allowlist or cwd-containment check would be even better for an author-local server.

🐛 Bugs / Logic Issues

  • src/inspect/lint.rs:231-246rule_no_condition_references fires for every normal job with depends_on

    The rule emits an Info for every job that has depends_on but no explicit condition:. In the canonical Setup → Agent → Detection → SafeOutputs pipeline, this produces three findings every time (Agent, Detection, SafeOutputs all depend on their predecessor with no explicit condition). This is the correct and expected ADO behaviour — succeeded() is intentionally the default. Running ado-aw lint on any real pipeline will always emit these findings, making them indistinguishable from genuine issues.

    The rule seems to be intended to flag jobs that should declare a condition (e.g. a SafeOutputs-style job that wants to run after failure). As written, it documents normal ADO semantics rather than finding problems. Consider restricting it to jobs that appear to be in a failure-handling pattern (e.g. reference always() in their stage description, or sit after a job that explicitly has always()/failed()), or simply remove it — it adds noise without signal.

⚠️ Suggestions

  • src/inspect/whatif.rs:357-364, src/inspect/lint.rs:315-322, src/audit/pipeline_graph.rs:139-146all_jobs() defined three times identically

    Each module carries its own copy:

    fn all_jobs(summary: &PipelineSummary) -> Vec<&JobSummary> {
        match &summary.body { ... }
    }

    These could drift independently. The PipelineSummary type lives in src/compile/ir/summary.rs; a pub fn all_jobs() method or free function there would keep a single source of truth.

  • src/main.rsado-aw audit --output default changed from ./logs to $TEMP/ado-aw/audit (breaking)

    Silently relocating the default output directory is a breaking change for any scripts that relied on ./logs being populated. The PR description notes this as intentional, but it's worth a CHANGELOG entry or a v0.x.0 bump note so operators aren't surprised.

✅ What Looks Good

  • The resolve_source_path security story in audit/pipeline_graph.rs is thorough — documented residual risk, tested for non-.md absolutes, .. traversal, and ~ prefix. The tests for the malicious-aw_info.json path are especially useful.
  • classify_condition / contains_unnegated_call in whatif.rs correctly handles not(failed()), not(not(failed())), and the failed() substring inside succeededOrFailed(). The regression tests for each of those cases are exactly what's needed.
  • PAT redaction in TraceFailureParams / AuditBuildParams Debug impls is a good safety habit.
  • The build_pipeline_ir public entry point propagates errors cleanly with anyhow::Context throughout; all unwrap()s in the diff are confined to #[cfg(test)] modules or are panic-safe constants (regex compilation, StepId::new on known literals).

Generated by Rust PR Reviewer for issue #998 · 371.2 AIC · ⌖ 12.6 AIC · ⊞ 32.9K ·

jamesadevine and others added 9 commits June 14, 2026 21:59
… whatif, lint, catalog, trace, mcp-author)

Adds an agent-facing read-only surface over the typed pipeline IR so
agents can reason about pipeline structure and failures programmatically.

- New public summary types in `src/compile/ir/summary.rs` (PipelineSummary,
  GraphSummary, etc.) with `schema_version` pinned at 1.
- New `compile::build_pipeline_ir(path) -> (FrontMatter, Pipeline)`
  read-only entry point that builds the typed IR without writing YAML.
- New `src/inspect/` module + CLI subcommands:
    * `inspect <source>`             — pipeline summary (text/json)
    * `graph dump|deps|outputs`      — resolved dependency graph
    * `whatif <source> --fail <id>`  — static reachability over conditions
    * `lint <source>`                — structural checks
    * `catalog --kind <k>`           — safe-outputs/runtimes/tools/engines/models
    * `trace <build-id-or-url>`      — joins ADO timeline with the IR graph
- New `src/mcp_author/` stdio MCP server exposing every command above as
  a read-only MCP tool for IDE / Copilot Chat integration.
- Audit integration: `AuditData.pipeline_graph`, `JobData.upstream_jobs` /
  `downstream_jobs`, and a new finding kind for downstream-skip impact.
- Docs: `docs/mcp-author.md`, public-JSON-summary section in `docs/ir.md`,
  new commands in `docs/cli.md`, audit fields in `docs/audit.md`, prompt
  updates for debug + update workflows.

Build: `cargo build`, `cargo test` (1871 + 4 passed), `cargo clippy` (0 warnings).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Windows-style hardcoded path failed on Linux CI. Resolve from

CARGO_MANIFEST_DIR so the test works on both platforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
Co-authored-by: jamesadevine <4742697+jamesadevine@users.noreply.github.com>
…dback

Security:

- mcp_author: manual Debug impls for TraceFailureParams and AuditBuildParams so the optional pat field is no longer leaked via {:?} / dbg!() / rmcp error traces.

- audit::pipeline_graph::resolve_source_path: require .md extension and reject parent-dir / tilde components to close the /home/user/.ssh/id_rsa exfiltration vector flagged by the reviewer. populate_pipeline_graph downgrades a rejection to a warning so legitimate audits still complete.

Maintainability:

- audit::model::JobData: add matches_ir_id helper. findings::add_downstream_impact_findings and inspect::trace::find_runtime_job now both delegate to it instead of carrying duplicate Stage.Job suffix-match heuristics.

- audit::findings::add_downstream_impact_findings: emit a Recommendation pointing operators at the failed-job logs (the _recommendations parameter is no longer unused).

- inspect::lint::rule_step_id_collisions: collision message now names BOTH the first-seen producer location and the colliding consumer (was only pointing at the second occurrence).

- inspect::whatif::classify_condition: doc-comment now records the coverage gap for variable-based conditions (e.g. Agent.JobStatus comparisons) which we deliberately classify as Skipped; new test pins that behaviour.

New tests:

- audit::pipeline_graph: 4 resolver tests covering the rejected attack patterns and the accepted legitimate absolute-path case.

- audit::pipeline_graph::populate_pipeline_graph_records_warning_on_malicious_source — end-to-end test that a malicious source string is recorded as a warning rather than read from disk.

- inspect::whatif::variable_based_condition_is_conservatively_skipped — pins the documented limitation.

Validated locally with cargo build, cargo test (1880 unit tests + integration suites pass), and cargo clippy --all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…is-output lint + refresh inspect mod doc

Bugs:
- inspect::whatif::contains_unnegated_call: add a word-boundary guard
  so the canonical-marker search no longer matches `failed()` inside
  `succeededorfailed()`. Previously `not(succeededOrFailed())` was
  wrongly classified as RunsAnyway because the inner `failed()` match
  started at offset 11 and the four chars before it were `edor`, never
  `not(`. New is_word_boundary_before helper + regression test pinning
  the corrected behaviour.
- inspect::lint::rule_missing_is_output: drop the TODO-guarded
  auto_is_output check that made the rule unreachable in normal usage.
  The rule now consistently fires when a cross-step consumer lacks
  isOutput, catching future drift between summary patching and graph
  codegen. Doc-comment explains the invariant.

Docs:
- inspect::mod doc-comment: drop the stale "Future siblings (not yet
  landed)" section -- trace/whatif/lint/catalog/graph_deps/graph_outputs
  are all present in this PR. The refreshed layout list documents every
  module currently in the inspect tree.

Validated locally: cargo build, cargo test --bin ado-aw (1881 unit
tests pass), cargo clippy --all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…dback

Bugs:
- audit::pipeline_graph::timeline_name_matches_job and
  audit::model::JobData::matches_ir_id: tighten the dotted-name suffix
  fallback so multi-level names like `Stage1.SubStage.Agent` no longer
  spuriously match against the bare id `Agent`. The fallback now only
  accepts a single-level `Stage.Job` qualifier (rsplit_once + prefix
  must contain no dot). New regression tests pin the behaviour.
- audit::cli::fetch_audit_data_inner: extract the post-fetch enrichment
  (pipeline_graph + metrics + derive_findings) into a single helper so
  the fresh-download and cache-load paths both produce structurally
  identical AuditData. The cache-load path now persists the recomputed
  data back to run-summary.json when it changed, so subsequent runs see
  the canonical shape and tooling diffing successive outputs is no
  longer confused by drift between the saved file and in-memory result.

Suggestions:
- mcp_author::audit_build: route `no_cache: true` invocations through a
  per-invocation `tempfile::tempdir()` so two concurrent calls for the
  same build can no longer race on partially-written artifacts in the
  shared `${TEMP}/ado-aw/audit/build-<id>` directory. Cached calls
  continue to use the shared cache root so warm-cache reads still
  benefit from cross-invocation reuse.
- compile::build_pipeline_ir: add `/* skip_integrity */` and
  `/* debug_pipeline */` inline-name comments to every dispatched
  target-builder call (only Standalone had them) so future callers
  cannot cargo-cult the wrong positional bool.

Validated locally: cargo build, cargo test (1884 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…998 round 4

All four items in the latest review are documentation-only — the
runtime behaviour is already correct and the reviewer asked for
explicit comments so future maintainers do not weaken the guards or
re-introduce the fixed bugs by accident.

- inspect::whatif::classify_condition: doc-comment now explicitly
  flags the string-literal false-positive (e.g. a condition like
  eq(variables['result'], 'failed()') would match the literal
  substring) alongside the existing variable-condition limitation.
  This is acceptable because ADO conditions are compiler-generated,
  but the residual gap is now documented.

- audit::cli::fetch_audit_data_inner cache write path: add an inline
  comment recording the deliberate lack of filesystem locking on
  save_run_summary. Two concurrent ado-aw audit runs for the same
  build can race on this write; the failure is recorded as a warning
  rather than aborting the audit, and both writers derive from the
  same on-disk artifacts so the resulting summary stays internally
  consistent.

- audit::pipeline_graph::resolve_source_path: complete the previously
  truncated security comment so the residual risk is unambiguous —
  the .md extension check is the primary gate for absolute paths, and
  weakening or removing it without adding a containment check would
  silently re-open the arbitrary-file-read vector flagged earlier.

- main: comment on the std::process::exit(1) in the Lint dispatch arm
  explaining the intentional Drop-bypass (mirrors tsc --noEmit /
  eslint) and that the only resources in scope are runtime-managed
  async I/O.

No behavioural changes. Validated locally with cargo build, cargo
test --bin ado-aw (1884 unit tests pass), and cargo clippy
--all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…crate::audit::default_cache_root

Previously the `./logs` path was hardcoded in three places and the
default cache directory disagreed between entry points:

- The CLI `audit` command defaulted `--output` to `./logs`.
- The CLI `trace` command and `inspect::build_trace` opened
  `Path::new("./logs")` directly.
- The mcp-author `audit_build` tool used
  `${TEMP}/ado-aw/audit` while the mcp-author `trace_failure` tool
  inherited the `./logs` default — silently scattering log
  directories under arbitrary IDE working directories.

Introduce `crate::audit::default_cache_root()` returning
`${TEMP}/ado-aw/audit` on every platform, re-exported from
`crate::audit`. Every entry point now resolves to that helper:

- CLI `audit`: `--output` is `Option<PathBuf>`, unset → helper.
- CLI `trace`: passes `output: None` so `inspect::build_trace`
  resolves to the helper.
- `inspect::build_trace`: default for `TraceOptions::output: None`
  is the helper.
- mcp-author `audit_build`: uses the helper for normal calls and
  layers a per-invocation `tempfile::tempdir()` on top when
  `no_cache: true`.
- mcp-author `trace_failure`: passes `output: None`.

Docs (`docs/audit.md`, `docs/cli.md`,
`site/src/content/docs/setup/cli.mdx`,
`site/src/content/docs/reference/audit.mdx`,
`prompts/debug-ado-agentic-workflow.md`) updated to reflect the
new default and to note that the cache is shared across every
audit entry point.

Validated locally: cargo build, cargo test (1884 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesadevine jamesadevine force-pushed the feat/agent-ir-tools branch from f775a20 to 7660b06 Compare June 14, 2026 21:01
… noisy lint rule + restore ./logs CLI default

Addresses PR #998 round 6:

- Drop `inspect::lint::rule_no_condition_references`. The rule fires
  an Info finding for every job that has `depends_on` with no
  explicit `condition:` — i.e. every non-Setup job in the canonical
  Setup → Agent → Detection → SafeOutputs shape. ADO's default
  `succeeded()` semantics are intentional, not a bug, so the rule
  documented normal behaviour rather than finding problems and made
  genuine findings harder to spot. Removed entirely; the
  serialisation roundtrip test that referenced the rule's code
  string still passes because the literal value is independent of
  rule registration.

- Centralise `all_jobs()` as `PipelineSummary::all_jobs`. Three
  copies (in `audit::pipeline_graph`, `inspect::lint`, and
  `inspect::whatif`) collapsed into a single method on the public
  IR summary type. Call sites in those three modules plus
  `inspect::trace::stage_for_job` now go through the method. Future
  body-shape additions only need updating in one place.

- Revert the CLI `ado-aw audit --output` default from
  `${TEMP}/ado-aw/audit` back to `./logs` to preserve backward
  compatibility for scripts that consume the documented `./logs`
  location. Non-CLI callers — `inspect::build_trace` (used by the
  CLI `trace` command and the mcp-author `trace_failure` tool) and
  the mcp-author `audit_build` tool — continue to anchor under
  `crate::audit::default_cache_root()` so they do not scatter
  `./logs/` directories under arbitrary IDE working directories.
  Docs (`docs/audit.md`, `docs/cli.md`, both site mdx files, and the
  debug prompt) updated to reflect the per-entry-point split.

Validated locally: cargo build, cargo test (1887 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — well-layered design, strong test coverage, and commendable security hygiene throughout. Two issues worth addressing before merge.


Findings

🐛 Bugs / Logic Issues

  • src/inspect/whatif.rsis_negated_call panics on non-ASCII char boundaries

    while idx >= NOT_PREFIX_LEN
        && normalized_condition[idx - NOT_PREFIX_LEN..idx] == *NOT_PREFIX

    normalized_condition[idx - 4..idx] is a str byte-range slice. Rust panics if either bound isn't on a UTF-8 char boundary. call_idx is guaranteed to be on a char boundary (it comes from .find()), but idx - 4 is only safe when the four preceding bytes are all ASCII. If a condition string ever contains a multi-byte UTF-8 sequence immediately before not( — unlikely but possible if a job display name leaks in — this panics at runtime.

    Fix: compare byte slices instead:

    && normalized_condition.as_bytes().get(idx - NOT_PREFIX_LEN..idx)
        == Some(NOT_PREFIX.as_bytes())

    This is always bounds-safe, no char-boundary requirement.

  • src/audit/pipeline_graph.rs:resolve_source_path — symlink bypasses the .md extension gate

    For absolute paths, the only check is the file extension:

    if path.is_absolute() {
        return Ok(path);  // path ends in .md → accepted
    }

    A symlink at /tmp/foo.md → /etc/passwd passes this check and proceeds to build_pipeline_ir which will attempt to read it. The blast radius is just a parse error (as your security comment correctly notes), but the existing comment says "Do not weaken or remove the .md extension check without also adding a containment check" — the symlink vector goes unmentioned and is a silently open gap against any "add a safeguard" intent. Consider adding tokio::fs::symlink_metadata or noting symlinks explicitly in the security comment so future maintainers don't miss it.


🔒 Security Concerns

  • src/audit/cli.rs:default_cache_root — world-readable path on multi-user Linux

    pub fn default_cache_root() -> PathBuf {
        std::env::temp_dir().join("ado-aw").join("audit")
    }

    On Linux without pam_tmpdir, std::env::temp_dir() returns /tmp, which is world-readable. Build artifacts and pipeline summaries (job names, ADO org/project, finding descriptions) written under /tmp/ado-aw/audit are visible to any local user. The old ./logs default was CWD-relative and under caller control. The MCP-server motivation for a shared root is sound; consider at minimum creating the directory with mode 0700 on first use, or using dirs::cache_dir() which gives ~/.cache/ado-aw on XDG Linux (always user-private).


⚠️ Suggestions

  • src/inspect/lint.rs:consumed_outputs — considers only cross-step refs, not within-step condition refs

    consumed_outputs is built from graph.outputs_needing_is_output, which captures only outputs consumed by other steps (cross-step refs that need isOutput=true). If a step declares an output and that output is only referenced in the same step's own condition: (an edge case, but representable in the IR), the output won't appear in outputs_needing_is_output, so rule_unused_output will fire a false-positive warning. Low practical impact — just worth a // NOTE: comment in consumed_outputs so future contributors know the scope.

  • src/audit/pipeline_graph.rs:find_matching_job_summary — no stage disambiguation

    find_matching_job_summary finds the first JobSummary whose id matches; in a staged pipeline, two stages could theoretically have jobs with the same id (though the IR graph pass rejects this). A defensive assert or a stage-aware match would make the intent explicit and protect against future changes to uniqueness guarantees.


✅ What Looks Good

  • PAT redaction in Debug impls — Both TraceFailureParams and AuditBuildParams implement custom Debug that calls redacted_pat. This is exactly right; it means {:?} and any framework error trace never leaks credentials.
  • timeline_name_matches_job — correct single-level guard — Rejecting Stage1.SubStage.Agent via !prefix.contains('.') is the right call; the regression test for multi-level qualified names confirms the intent.
  • Security comment in resolve_source_path — The ## Residual risk block is exemplary documentation; the blast-radius analysis and the "do not remove without adding containment" warning are exactly what security reviewers need.
  • derive_post_processing refactor — centralising findings::derive_findings + pipeline_graph::populate_pipeline_graph so both fresh-download and cache-hit paths produce structurally identical AuditData eliminates a whole class of cache-vs-live drift bugs.
  • whatif word-boundary guard — the is_word_boundary_before check preventing failed() from matching inside succeededorfailed() is subtle and correct; the unit tests for negated predicates cover the tricky cases well.

Generated by Rust PR Reviewer for issue #998 · 342.7 AIC · ⌖ 12.8 AIC · ⊞ 34.5K ·

…in resolve_source_path

Two bugs from PR #998 round 7:

- inspect::whatif::is_negated_call: switch from `&str[range]`
  indexing to byte-slice comparison via
  `bytes.get(idx - 4..idx) == Some(b"not(")`. The previous form
  panicked if the four bytes immediately before the matched call
  straddled a UTF-8 char boundary — possible if a non-ASCII display
  name leaked into the condition string. New regression test
  prepends `é` (a two-byte UTF-8 sequence) before `failed()` and
  asserts the classifier returns RunsAnyway instead of panicking.

- audit::pipeline_graph::resolve_source_path: a symlink at
  `/tmp/foo.md` → `/etc/passwd` lexically satisfied the `.md`
  extension check and was passed through unguarded. We now
  canonicalize absolute paths after the extension check and reject
  them if the resolved target does not also end in `.md`, closing
  the symlink-bypass vector. Legitimate `current.md` → `v1.md` style
  symlinks remain accepted because the resolved target still ends
  in `.md`. A new `has_md_extension` helper factors out the shared
  extension test. Two new unix-only regression tests cover both
  the rejected-symlink and the accepted-symlink cases. The
  security doc-comment now explicitly calls out the symlink-target
  re-check so future maintainers do not weaken it.

Validated locally: cargo build, cargo test --bin ado-aw (1888
unit tests pass on Windows; +2 unix-only symlink tests compile
behind cfg(unix)), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: looks good overall — solid architecture, strong security posture on the resolve_source_path path; one genuine bug in the unused-output lint rule worth fixing before merge.


Findings

🐛 Bugs / Logic Issues

src/inspect/lint.rs:consumed_outputs — false positives for same-job consumers

rule_unused_output determines whether an output is "consumed" by checking summary.graph.outputs_needing_is_output (the set of outputs that require isOutput=true because they're consumed across jobs). An output referenced by a peer step within the same job — via env_refs or condition_refs — won't appear in outputs_needing_is_output and will be incorrectly flagged as unused-output.

This is inconsistent with graph_deps::step_refs, which correctly walks both env_refs and condition_refs regardless of job boundary. The fix is to union in those same-job refs:

fn consumed_outputs(summary: &PipelineSummary) -> BTreeSet<(String, String)> {
    let mut consumed: BTreeSet<_> = summary
        .graph
        .outputs_needing_is_output
        .iter()
        .flat_map(|entry| entry.outputs.iter().map(|o| (entry.step.clone(), o.clone())))
        .collect();
    // Also count same-job consumers (env_refs + condition_refs);
    // these do not need isOutput=true but the output is still used.
    for (_, step) in all_steps(summary) {
        for r in step.env_refs.iter().chain(step.condition_refs.iter()) {
            consumed.insert((r.step.clone(), r.name.clone()));
        }
    }
    consumed
}

The existing test create_pull_request_fixture_has_no_unused_output_inspect_lint won't catch this because that fixture's outputs are all consumed cross-job; a regression test with a same-job consumer would pin the fix.


⚠️ Suggestions

src/mcp_author/mod.rs:parse_graph_dump_format"json" format silently unsupported

The CLI's GraphCmd::Dump accepts text, json, dot, but the MCP graph_dump tool only accepts text and dot — the error message explicitly says "expected 'text' or 'dot'". Either expose json in the MCP tool (trivial — just add it to the match) or document the intentional omission in the GraphDumpParams docstring and the MCP tool description.

src/audit/cli.rs:derive_post_processing on cache hits — pipeline graph from current source

When loading a cached run-summary.json, derive_post_processing re-runs populate_pipeline_graph against the current local source file. If the workflow has been updated since the build was cached, the pipeline_graph section will reflect today's code, not the build's actual code. For ado-aw audit this is the expected behaviour (always correlate against local IR), but it's worth a comment in derive_post_processing calling this out so a future maintainer doesn't "fix" it into using the cached graph.

src/compile/ir/summary.rs:PipelineSummary::all_jobsVec allocation on every call

all_jobs() returns Vec<&JobSummary> and is called on the hot paths for populate_job_edges, find_matching_job_summary, and the inspect traversals. For typical pipeline sizes this is fine, but returning impl Iterator<Item = &JobSummary> (or an inline Either-style enum) would avoid the per-call heap allocation without any API sacrifice since all callers only iterate.


✅ What Looks Good

  • whatif negation logicis_negated_call + is_word_boundary_before + contains_unnegated_call handle the not(failed()) / not(not(failed())) / not(succeededOrFailed()) / multibyte-char cases correctly and are thoroughly covered by targeted regression tests. The byte-slice approach in is_negated_call (avoiding &str[idx-4..idx] to prevent UTF-8 boundary panics) is the right fix.

  • resolve_source_path security model — the .md extension gate, .. component rejection, tilde prefix rejection, and post-canonicalize symlink-target re-check together close the key arbitrary-file-read vectors documented in the threat analysis. The residual-risk section is honest and actionable.

  • PAT redaction in Debug implsTraceFailureParams and AuditBuildParams both implement Debug manually to emit <redacted> for the pat field. This prevents accidental credential exposure in rmcp's internal error traces.

  • **Job(redacted) single-level qualifier** — correctly rejects Stage1.SubStage.Agent(two-dot paths) while acceptingPipeline.Agent`, with regression tests pinning both behaviours.

  • build_pipeline_ir passes skip_integrity: true — the read-only IR entry point correctly bypasses every write side-effect across all four compile targets.

  • Trust boundary preservedmcp-author is a new, separate server; Stage 1 sandboxed agents do not gain access to inspect/graph/whatif/audit tools.

Generated by Rust PR Reviewer for issue #998 · 380.9 AIC · ⌖ 12.2 AIC · ⊞ 32.9K ·

…positive + zero-alloc all_jobs + JSON graph_dump + cache-hit doc

Bugs:
- inspect::lint::consumed_outputs: union the set of outputs flagged
  by `graph.outputs_needing_is_output` with every step's `env_refs`
  and `condition_refs`. Same-job consumers do not appear in
  `outputs_needing_is_output` (ADO does not require isOutput=true for
  them), and the previous logic mis-classified those outputs as
  unused. Two new regression tests pin the fix for both env_ref and
  condition_ref same-job consumers; existing cross-job and fixture
  tests remain green.

Suggestions:
- mcp-author `graph_dump`: now accepts `"json"` alongside `"text"`
  and `"dot"`, matching the CLI's `graph` subcommand. GraphDumpParams
  docstring + tool description updated; the error message lists all
  three formats.
- audit::cli::derive_post_processing: doc-comment now explicitly
  records that cache-hit invocations correlate against the **current
  local source markdown** by design, so a future maintainer does not
  "fix" it into using a stale cached graph snapshot (the downstream
  findings rules depend on the fresh correlation).
- compile::ir::summary::PipelineSummary::all_jobs: switch the return
  type from `Vec<&JobSummary>` to `impl Iterator<Item = &JobSummary>`
  via a small `AllJobsIter` either-enum, removing the per-call heap
  allocation on the hot paths (`populate_job_edges`,
  `find_matching_job_summary`, the inspect traversals). All callers
  already chained `.into_iter()`; the redundant calls were dropped
  via `cargo clippy --fix`.

Validated locally: cargo build, cargo test (1890 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Clean layered design with solid security discipline in most places — after several rounds of fixes the major issues (PAT redaction, symlink bypass, byte-safe is_negated_call, same-job consumed_outputs false positive) are all resolved. One security concern remains open across every prior review round; one design issue with the new JSON dump format is worth a quick fix before merge.


Findings

🔒 Security Concerns

src/mcp_author/mod.rs:371-373source_path() accepts arbitrary paths without validation (still open)

fn source_path(path: &str) -> PathBuf {
    PathBuf::from(path)  // no checks at all
}

Every tool that takes a source_path parameter (inspect_workflow, graph_summary, graph_dump, step_dependencies, step_outputs, whatif, lint_workflow) feeds the caller-supplied string directly to build_pipeline_ir(), which opens and reads the file. This has been flagged in every previous round of review and is still unaddressed.

The mcp-author server runs as the invoking user and is explicitly intended for IDE/Copilot Chat use — an environment where the agent processes untrusted content (PR descriptions, issue comments, code). A prompt-injected request like inspect_workflow(source_path="../../.ssh/authorized_keys.md") would reach build_pipeline_ir, read the file, and return a parse error whose message includes the path.

The fix already exists in this PR: pipeline_graph.rs::resolve_source_path() implements exactly the right guards (.md extension check, .. component rejection, ~ prefix rejection, symlink re-check). Reuse it or extract it to compile::mod and call it from both sites. The doc-comment on resolve_source_path even warns "Do not weaken or remove the .md extension check... without also adding a containment check" — the MCP server should honour the same invariant.


🐛 Bugs / Logic Issues

src/mcp_author/mod.rs:190-199graph_dump with format="json" returns double-encoded JSON

// build_graph_dump(GraphFormat::Json) returns serde_json::to_string_pretty(&summary.graph)
// — a String containing JSON text.
let text_or_dot = inspect::build_graph_dump(&source, format).await?;
structured_result(GraphDumpResult { text_or_dot })   // serialised again → {"text_or_dot": "{...}"}

When format is "json", build_graph_dump returns a JSON-serialized GraphSummary string. mcp_author then re-serializes it into GraphDumpResult { text_or_dot: String }, so the MCP caller receives {"text_or_dot": "{\"step_locations\": [...]}"} — the inner JSON is a string literal that must be parsed twice.

graph_summary already returns a proper GraphSummary object for the same data. Either: (a) route json format through graph_summary instead, or (b) emit the pre-parsed value directly:

let format = parse_graph_dump_format(params.0.format.as_deref())?;
if format == GraphFormat::Json {
    let graph = inspect::build_graph_summary(&source).await.map_err(to_mcp_error)?;
    return structured_result(graph);
}
let text_or_dot = inspect::build_graph_dump(&source, format).await.map_err(to_mcp_error)?;
structured_result(GraphDumpResult { text_or_dot })

⚠️ Suggestions

  • src/inspect/whatif.rs:394,403b.chars().count() is called twice (once to allocate prev, once as the final index). A let b_len = b.chars().count(); at the top of levenshtein avoids the redundant O(n) scan.

  • src/audit/pipeline_graph.rs:101-102summary.all_jobs().into_iter()all_jobs() already returns impl Iterator; .into_iter() on an iterator is a no-op. Just .all_jobs().find(...).


✅ What Looks Good

  • resolve_source_path() — extension check + symlink canonicalize re-check + .. component rejection + ~ guard is thorough; the residual-risk doc-comment is accurate and actionable.
  • is_negated_call() — byte-slice comparison (bytes.get(idx-4..idx) == Some(b"not(")) is correctly crash-safe against non-ASCII boundaries.
  • **Job(redacted) — single-level dot restriction is well-documented and the regression test for Stage1.SubStage.Agent` pins the right boundary.
  • PAT redaction — manual Debug impls on TraceFailureParams and AuditBuildParams prevent PAT exposure in framework error traces; redacted_pat helper is clean.
  • AllJobsIter — zero-allocation iterator enum avoids the per-call Vec without reaching for Box(dyn Iterator).
  • consumed_outputs — correctly unions outputs_needing_is_output with same-job env_refs / condition_refs; the regression test consumed_outputs_do_not_emit_unused_output_inspect_lint pins the fix.
  • Test coverage — end-to-end CLI tests on the create-pull-request.md fixture plus targeted unit tests for every new rule and helper are well-chosen.

Generated by Rust PR Reviewer for issue #998 · 1.2K AIC · ⌖ 13 AIC · ⊞ 32.9K ·

…N + perf nits

Security:
- mcp_author::source_path: was previously `PathBuf::from(path)` with
  zero validation, so a prompt-injected MCP request such as
  `inspect_workflow(source_path="../../.ssh/authorized_keys.md")`
  would reach build_pipeline_ir and read the file. Now mirrors the
  guards already in `audit::pipeline_graph::resolve_source_path`:
  require a `.md` extension, reject `..` components and `~` prefix,
  and canonicalize absolute paths to re-check the extension on the
  symlink target. Function is now async (`tokio::fs::canonicalize`);
  all seven call sites were updated to `source_path(&...).await?`.
  Four new regression tests cover the rejection cases (non-md,
  `..`, `~`) plus the accepted relative `.md` case.

Bugs:
- mcp_author::graph_dump: `format = "json"` previously routed through
  build_graph_dump(Json), which returns a JSON *string*. Wrapping
  that in `GraphDumpResult { text_or_dot: String }` produced
  `{"text_or_dot": "<escaped JSON>"}` and forced callers to parse
  the inner JSON twice. The json format now short-circuits to
  build_graph_summary so callers receive a structured GraphSummary
  object. A new regression test decodes the result as GraphSummary
  to pin the contract.

Suggestions:
- inspect::whatif::levenshtein: cache `b.chars().count()` once
  instead of scanning twice (allocate + final index).
- audit::pipeline_graph::find_matching_job_summary, inspect::trace::stage_for_job,
  inspect::whatif::find_job: drop the redundant `.into_iter()`
  calls left over from the Vec-to-Iterator migration in
  `PipelineSummary::all_jobs`.

Validated locally: cargo build, cargo test (1895 unit tests pass +
all integration suites), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Overall well-engineered — the security-sensitive paths are handled carefully, anyhow usage is clean, and there's good test coverage for edge cases. Two findings worth addressing before merge.


Findings

🐛 Bugs / Logic Issues

add_downstream_impact_findings fires on jobs that may not have skipped

src/audit/findings.rs ~line 368

The finding title is "Downstream jobs skipped due to {job.name} failure" but the predicate is:

if !job.failed() || job.downstream_jobs.is_empty() { continue; }

downstream_jobs is populated from IR graph edges — it includes all IR dependents, including jobs with an always() condition that would actually run. The description embeds the real runtime classification (so it won't show incorrect data), but the title categorically asserts "skipped" even when some downstream jobs succeeded through a bypass condition.

The practical impact today is low (canonical Setup → Agent → Detection → SafeOutputs pipelines don't use always() on the main jobs), but as pipelines grow more complex this will mislead. Two minimal fixes:

  • Option A — change title to "Downstream jobs potentially impacted by {job.name} failure" and keep the detail in the description.
  • Option B — filter the finding to only fire when at least one downstream job actually skipped:
    let any_skipped = job.downstream_jobs.iter().any(|id| {
        audit.jobs.iter()
            .find(|c| c.matches_ir_id(id))
            .map(|j| j.result.as_deref().unwrap_or_default()
                 .eq_ignore_ascii_case("skipped"))
            .unwrap_or(false) // absent = expected-to-skip counts
    });
    if !any_skipped { continue; }

⚠️ Suggestions

PipelineGraphSection.graph: PipelineSummary naming creates .graph.graph. chains

src/audit/model.rs ~line 307 / src/inspect/trace.rs ~line 197

The field pub graph: PipelineSummary inside PipelineGraphSection composes with PipelineSummary.graph: GraphSummary to produce navigation like:

// trace.rs build_step_report
let location = graph   // &PipelineGraphSection
    .graph             // PipelineSummary  ← this is called "graph"?
    .graph             // GraphSummary
    .step_locations
    .iter()
    ...

Renaming the field on PipelineGraphSection to pub summary: PipelineSummary (and updating callers) eliminates the confusion at no cost. The PipelineGraphSection concept already captures the "graph" context in its own name.


String-based --direction / --format args not validated by clap

src/main.rs ~line 187–230 (the GraphCmd variants)

Both direction and format are declared as String with default_value, then manually matched at dispatch time. This means an invalid value is caught at runtime (anyhow::bail!) rather than at argument-parse time. Using a clap value parser (or just impl From<&str> + ValueEnum) would surface the error earlier with better help text. Minor quality-of-life issue, not a blocker.


✅ What Looks Good

  • resolve_source_path + mcp_author::source_path security model is thorough: .md extension gating, .. / ~ rejection, symlink re-check after canonicalize, documented residual TOCTOU risk, and targeted tests for each vector (symlink-bypass, traversal, tilde expansion). Hard to fault.
  • is_negated_call byte-slice indexing correctly avoids the panic on multi-byte UTF-8 boundaries that would have occurred with the naive [idx - 4..idx] slice approach. The regression test with é is a good catch.
  • matches_ir_id single-level prefix guard — the regression comment explaining why rsplit('.').next() was wrong and the test covering "Stage1.SubStage.Agent" are both appreciated.
  • PAT redaction in TraceFailureParams/AuditBuildParams Debug impls is exactly right; the rmcp framework logs errors with {:?} which would otherwise expose credentials.
  • Trust boundary between the new mcp-author server (read-only, author-facing) and the existing SafeOutputs MCP (write, sandboxed pipeline Stage 1) is clearly preserved and documented.
  • build_pipeline_ir / check_pipeline alignment — applying codemods in-memory only without rewriting disk is the right semantic for a read-only inspection path.

Generated by Rust PR Reviewer for issue #998 · 940.7 AIC · ⌖ 12.5 AIC · ⊞ 34.5K ·

… .graph.graph + clap value-enum

Addresses PR #998 round 10:

Bugs:
- audit::findings::add_downstream_impact_findings: filter the finding
  to only fire when at least one downstream job actually skipped or
  was cancelled (or was absent from the runtime timeline, which
  signals an expected skip). Previously the rule fired whenever an
  upstream failed and had any IR-derived downstream — including
  cleanup jobs with always() that successfully ran. Title also
  reworded to "potentially impacted by {} failure" so even with the
  gate in place we do not categorically assert "skipped" for jobs
  that may have bypassed via always(). Recommendation reason
  updated to match. New `is_skipped_or_cancelled` helper handles
  US/UK spelling. New regression test pins the bypass-suppression
  behaviour; the existing test was updated to match the new title.

Suggestions:
- audit::model::PipelineGraphSection: rename field `graph:
  PipelineSummary` to `summary: PipelineSummary` so trace.rs no
  longer reads `.graph.graph.step_locations`. Callers updated
  (`pipeline_graph.rs`, `trace.rs`), and the same-named doc in
  `docs/audit.md` switched from `pipeline_graph.graph` to
  `pipeline_graph.summary`.
- main::GraphCmd::Dump/Deps: switch `--format` and `--direction`
  from `String` + manual match to `clap::ValueEnum` derives on
  `GraphFormat` / `GraphDepsDirection`. Invalid values are now
  caught at parse time with clap's standard "possible values"
  help text instead of a runtime `anyhow::bail!`. Existing
  integration test updated to match the new clap rejection
  message.

Validated locally: cargo build, cargo test (1896 unit tests + all
integration suites pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Well-designed addition — the security model is solid and test coverage is thorough. One real bug to fix before merge, one minor security inconsistency to address, and a couple of low-priority notes.


Findings

🐛 Bugs / Logic Issues

src/audit/pipeline_graph.rs ~line 19 — corrupt aw_info.json aborts the audit instead of warning

let source = match read_source_from_aw_info(run_dir).await.transpose()? {

read_source_from_aw_info returns Some(Err(_)) when the file exists but fails to read or has malformed JSON. transpose()? converts that into a hard Err that propagates out of populate_pipeline_graph, aborting the entire audit. This contradicts the function's documented "warn and continue" contract (which is correctly applied to resolve_source_path failures just a few lines later).

A corrupt artifact from a bad run is a realistic scenario; it should degrade gracefully:

let source = match read_source_from_aw_info(run_dir).await {
    Some(Ok(s)) if !s.trim().is_empty() => Some(s),
    Some(Err(err)) => {
        record_warning(
            audit,
            "audit::pipeline_graph",
            format!("failed to read aw_info.json: {err:#}; skipping IR graph correlation"),
        );
        return Ok(());
    }
    _ => audit.overview.aw_info.as_ref().and_then(|info| info.source.clone()),
};

🔒 Security Concerns

src/mcp_author/mod.rs source_path — backslash normalization missing for relative paths

resolve_source_path in pipeline_graph.rs calls normalize_source_path (replaces \ with / on non-Windows) before the ParentDir component check, so a crafted path like ..\\workflow.md is correctly rejected on Linux. The source_path function in mcp_author/mod.rs omits this normalization step. On Linux, PathBuf::from("..\\workflow.md") yields a single Normal("..\\workflow.md") component — no ParentDir — so the traversal guard doesn't fire, and the path passes. Since \ is a valid filename character on Linux, the open would then fail with ENOENT rather than returning a real file, so the blast radius is an opaque read error rather than data exfiltration. The mcp-author server runs in an IDE context where prompt-injected requests are the threat model, so this is worth closing. The fix is one line — call normalize_source_path (or inline an equivalent str.replace('\\', "/")) on the input before constructing PathBuf.


⚠️ Suggestions

src/inspect/whatif.rs + src/inspect/graph_deps.rs — duplicated reachable_edges BFS

Both modules implement nearly identical BFS traversal over &[EdgeEntry] independently. The graph_deps version is strictly more capable (supports both directions); the whatif version is always-downstream. Consider lifting the bidirectional form into a small shared helper (e.g. in compile::ir::summary or a new inspect::graph_traversal module) to eliminate the divergence risk.

src/audit/pipeline_graph.rs + src/mcp_author/mod.rs — path validation duplicated with slight drift

The .md-extension check + .. rejection + ~ rejection + symlink re-check logic appears in both files with minor differences (backslash normalization, absolute vs. relative return value). A shared validate_source_path(s: &str) -> Result<PathBuf> in compile::mod or a new compile::path_guard module would prevent future drift between the two call sites.


✅ What Looks Good

  • whatif condition classifier: The is_negated_call parity counter correctly handles not(not(failed()))RunsAnyway, and the is_word_boundary_before guard neatly prevents failed() from firing inside succeededorfailed(). The coverage-limitations docstring is honest about the string-literal false-positive gap.
  • Path security model in pipeline_graph.rs: Thorough — .md extension gate, .. and ~ guards on relative paths, symlink re-check on absolute paths, well-documented residual risks. The test matrix (non-markdown absolute, parent-traversal, tilde, .md→non-.md symlink, .md.md symlink) covers the obvious attack surfaces.
  • Trust boundary: mcp-author is cleanly separated from the Stage 1 SafeOutputs MCP; the read-only surface doesn't bleed write capabilities into sandboxed agents.
  • build_pipeline_ir skip_integrity flag: Passing true for all targets in the read-only path is the right call — no lock file writes triggered by an inspect command.
  • dispatch_lint exit code: The process::exit(1) path in main.rs correctly mirrors tsc/eslint conventions and won't leave async resources in a bad state since lint has no long-lived handles.

Generated by Rust PR Reviewer for issue #998 · 614 AIC · ⌖ 12.7 AIC · ⊞ 34.5K ·

…h_guard + reachable_edges + fix corrupt aw_info abort

Bugs:
- audit::pipeline_graph::populate_pipeline_graph: a corrupt
  aw_info.json (read error or malformed JSON) previously aborted the
  entire audit via `transpose()?`. Now downgraded to a recorded
  warning, matching the documented "warn and continue" contract that
  already applies to resolve_source_path / missing-source failures.
  New regression test writes `{not valid json` and asserts the
  populate call succeeds with the expected warning.

Security:
- mcp_author::source_path: the previous implementation built
  PathBuf::from the trimmed input directly, so `..\\workflow.md` on
  Linux became a single Normal component and slipped past the
  ParentDir traversal check. The shared guard now normalises
  platform separators before the component check, closing the
  bypass. A backslash-traversal regression test in
  src/mcp_author/tests.rs pins the fix.

Refactors (deduplicate audit + mcp-author validation logic):
- New `src/compile/source_path_guard.rs` exposes
  `validate_workflow_source_path(source: &str) -> Result<ValidatedSourcePath>`
  with the full security contract: trim + separator normalisation,
  `.md` extension check, `..`/`~` rejection, and symlink-target
  re-check on absolute paths. Six dedicated tests (including a
  unix-only symlink-bypass test) cover every guard.
- audit::pipeline_graph::resolve_source_path now delegates to the
  shared guard; the local `normalize_source_path` and
  `has_md_extension` helpers were removed.
- mcp_author::source_path now delegates to the shared guard and
  wraps the anyhow error as `McpError::invalid_params`.

Suggestion: shared BFS over edge lists.
- `inspect::graph_deps::reachable_edges` is now `pub` and accepts a
  direction. `inspect::whatif::reachable_edges` collapsed to a
  one-line downstream-direction wrapper that calls the shared
  helper, removing the BTreeMap/VecDeque duplication. Unused
  imports trimmed from `whatif.rs`.

Validated locally: cargo build, cargo test (1904 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Well-designed feature addition, but there is one security gap in source_path_guard worth fixing before merge — the relative-path branch does not cover symlink targets, which makes the module's stated security contract incomplete.


Findings

🔒 Security Concerns

src/compile/source_path_guard.rs + src/audit/pipeline_graph.rs — symlink extension re-check only fires for absolute input paths

The module doc promises:

refuses .md symlinks that resolve to non-.md targets

...and rejects_md_symlink_to_non_md_target test verifies this. However, the canonicalize-and-re-check-extension guard lives only in the if path.is_absolute() branch. The relative path branch returns cwd.join(&path) without canonicalization.

Then pipeline_graph.rs does:

let resolved_source_path = tokio::fs::canonicalize(&source_path)
    .await
    .unwrap_or_else(|_| source_path.clone());
let (_fm, pipeline) = crate::compile::build_pipeline_ir(&resolved_source_path)...

If a relative source like workflows/evil.md is a symlink to /etc/passwd, then after validate_workflow_source_path the path is cwd/workflows/evil.md (lexical .md extension, passes). canonicalize in populate_pipeline_graph then resolves to /etc/passwd, and build_pipeline_ir opens that file. The parse will fail, but serde_yaml error messages can surface line snippets from the input, creating a narrow information-leak path.

The attack is constrained — it requires both a malicious aw_info.json (plausible via prompt injection) and a symlink at the resolved local path — but the stated contract is not met and the existing test does not exercise the relative-path case.

Suggested fix in validate_workflow_source_path, relative path branch:

let joined = cwd.join(&path);
// Mirror the absolute-path symlink check.
if let Ok(canonical) = tokio::fs::canonicalize(&joined).await
    && !has_md_extension(&canonical)
{
    anyhow::bail!(
        "refusing source path '{normalized}': symlink resolves to non-`.md` target '{}'",
        canonical.display()
    );
}
Ok(ValidatedSourcePath { path: joined, normalized })

(And add a companion test with a relative symlink, mirroring the existing #[cfg(unix)] test.)


⚠️ Suggestions

src/inspect/whatif.rs:classify_conditionnot(succeeded()) is classified as Skipped but runs after failure

The documented limitations note that variable-based conditions are conservatively classified as Skipped. However, the pure-function pattern not(succeeded()) (run when parent did NOT succeed, i.e. including failure) is also conservatively misclassified because succeeded() is not in the bypass list. It's a low-frequency pattern in compiled pipelines, but it's worth one line in the ## Coverage limitations doc-comment so authors debugging a cleanup job aren't surprised by the conservative read.


✅ What Looks Good

  • source_path_guard design — centralising path validation so audit::pipeline_graph and mcp_author cannot drift apart is exactly the right approach. The .. component check, ~ prefix rejection, and the backslash-separator normalization (with the regression test for the Linux ..\\workflow.md bypass) are all solid.
  • Manual Debug for PAT-bearing paramsTraceFailureParams and AuditBuildParams both have manual Debug impls that print <redacted> for the PAT. This is easy to miss and was done correctly here.
  • is_negated_call byte-slice approach — using bytes.get(idx - NOT_PREFIX_LEN..idx) instead of a &str slice prevents a panic on multi-byte UTF-8 sequences adjacent to a condition call, and the specific regression test for this is a nice touch.
  • **Job(redacted) single-level guard** — the !prefix.contains('.')restriction preventingStage1.SubStage.Agentfrom matchingAgent` closes a real ambiguity, and the regression test makes the intent explicit.
  • build_pipeline_ir is skip_integrity = true — all four target dispatches pass this correctly so the inspect path never writes side effects.

Generated by Rust PR Reviewer for issue #998 · 400.7 AIC · ⌖ 12.9 AIC · ⊞ 32.9K ·

…classify_condition limitations

Security (relative-path symlink target re-check):
- compile::source_path_guard::validate_workflow_source_path
  previously only canonicalised + re-checked the `.md` extension on
  the **absolute-path** branch. A relative input such as
  `workflows/evil.md` that symlinked to `/etc/passwd` slipped past
  the lexical `.md` check; the eventual `canonicalize` inside
  `audit::pipeline_graph::populate_pipeline_graph` would then resolve
  to the arbitrary target, narrowing the documented contract.
- The relative branch now mirrors the absolute-path guard: join to
  cwd, canonicalize, and re-check the resolved extension. Module
  doc updated to call this out. New unix-only regression test
  (rejects_relative_md_symlink_to_non_md_target) creates the
  bypass setup inside a tempdir, switches `set_current_dir` under
  a Mutex, and asserts the resolver refuses with the same symlink
  rejection message.

Docs (inspect::whatif::classify_condition):
- Expanded the "Coverage limitations" section to enumerate every
  known static-analyser blind spot so authors debugging a what-if
  result are not surprised: not(succeeded()) misclassification
  (already listed), scoped predicate forms like
  failed('Setup'), bare canceled(), variable / dependency-result
  comparisons, surviving template ${{ }} expressions, ignored
  boolean composition, multi-line not(...) wraps, step-level
  conditions, and string-literal false positives. Closes the
  reviewer's "tell me more" follow-up. No behaviour change — the
  classifier remains a conservative lower bound, with
  inspect::trace as the authoritative source for any
  classification disagreement.

Validated locally: cargo build, cargo test (1904 unit tests + every
integration suite pass — +1 on Unix from the new symlink test),
cargo clippy --all-targets --all-features (clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Well-designed PR with a clean layered architecture and good security thinking. Two issues worth addressing before merge: a ..-traversal gap in the path guard for absolute paths, and a stage-condition propagation bug in whatif for staged pipelines.


Findings

🐛 Bugs / Logic Issues

src/inspect/whatif.rs — stage condition not propagated to job classification

In reachable_downstream_jobs, jobs that are added via the stage-edge traversal path are classified by classify_condition(&job.condition) only. When a downstream stage has condition: always() (or succeededOrFailed()) but its individual jobs carry no explicit condition (default succeeded()), those jobs are incorrectly reported as Skipped instead of RunsAnyway.

Concrete path: 1ES or stage target pipeline where the SafeOutputs stage has condition: always() (common cleanup pattern) but SafeOutputs jobs inside it carry the default condition. whatif --fail Agent would tell the author that SafeOutputs is skipped when it would in fact run.

StageSummary.condition is available in the summary — the fix is to look it up when classifying jobs added via the stage-edge branch:

// inside the `for downstream_stage in ...` loop, after resolving the stage:
let stage_classification = find_stage(summary, &downstream_stage)
    .and_then(|s| s.condition.as_ref())
    .map(|c| classify_condition(&Some(c.clone())))
    .unwrap_or(WhatIfClassification::Skipped);

// then, in the final `filter_map`:
classification: std::cmp::max_by_key(
    stage_classification,
    classify_condition(&job.condition),
    |c| matches!(c, WhatIfClassification::RunsAnyway) as u8,
),

Or more simply: if stage_classification == RunsAnyway, inherit it for all jobs in the stage.


🔒 Security Concerns

src/compile/source_path_guard.rsParentDir check does not apply to absolute paths

The module documentation states the guard "refuses parent-directory traversal", but the Component::ParentDir check only fires for relative paths. An absolute path such as /workspace/../../home/runner/.env.md passes through unchecked on the absolute branch.

This affects both call sites the module explicitly documents as untrusted:

  • audit::pipeline_graph — the source field from aw_info.json in a downloaded build artifact. A prompt-injected agent could write an adversarial absolute path there.
  • mcp_author — MCP parameters supplied by an IDE agent processing untrusted content (PR descriptions, issue comments).

The .md-only filter limits practical impact (a useful secret at /etc/passwd.md is unlikely), but it does diverge from the stated contract. Fix is straightforward — check ParentDir before the absolute/relative branch:

if path.components().any(|c| matches!(c, Component::ParentDir)) {
    anyhow::bail!("refusing source path with parent-directory components: '{normalized}'");
}

You'd need a dedicated test case: "rejects_absolute_path_with_parent_dir_component".


⚠️ Suggestions

src/inspect/lint.rs:89summary binding shadows the PipelineSummary parameter

pub fn report(summary: &PipelineSummary) -> LintReport {
    let findings = lint(summary);
    let summary = summarize_findings(&findings);  // shadows PipelineSummary with LintSummary
    LintReport { findings, summary }
}

The code is correct (the parameter is consumed before the shadow), but two bindings named summary with different types in the same visible scope is confusing. Suggest renaming the inner binding to counts or tally.

src/inspect/whatif.rs — suggestion always returned regardless of Levenshtein distance

closest() always returns the nearest candidate with no distance threshold. A completely unrelated input like whatif --fail xyzzy will suggest whatever IR id happens to be lexicographically nearest, which can be more misleading than no suggestion at all. A simple guard like if distance <= needle.len() / 2 + 2 suppresses low-quality suggestions.


✅ What Looks Good

  • is_negated_call byte-slice comparison — avoiding str indexing at idx - 4 to stay crash-safe on non-ASCII condition strings leaked from display names is exactly the right call. Tests for not(failed()), not(not(failed())), and not(succeededOrFailed()) (the substring-match regression) are thorough.
  • default_cache_root() centralisation — using $TMP/ado-aw/audit as a stable single cache root across every entry point (CLI, MCP, trace) correctly prevents ./logs/ debris in project working trees.
  • source_path_guard symlink-target re-check on both branches — checking that the resolved target also has .md extension (not just the link name) is solid, and rejects_relative_md_symlink_to_non_md_target is exactly the right regression test.
  • populate_job_edges uses assignment, not append — the cache-reload re-derivation path is idempotent.
  • classify_condition limitations documented inline — the "Coverage limitations" block is honest, exhaustive, and well-ordered by likelihood of surprising an author. This level of documentation on a best-effort analyser is exactly what future maintainers need.
  • derive_post_processing shared by fresh and cached paths — good; previously these two paths could have drifted.

Generated by Rust PR Reviewer for issue #998 · 901.1 AIC · ⌖ 12.8 AIC · ⊞ 32.9K ·

…parent-dir guard on absolute paths + suggestion threshold + lint shadow

Bugs:
- inspect::whatif::reachable_downstream_jobs: jobs added via the
  stage-edge traversal branch are now classified by combining the
  containing stage's condition with the job's own. Previously only
  job.condition was inspected, so a `condition: always()` on a
  downstream cleanup stage was silently dropped on the floor and
  inner jobs (which default to succeeded()) were reported as
  Skipped. New `stronger_classification` + `find_stage` helpers
  lift the stage-level bypass through to its jobs. A
  stages-bodied regression test pins the new behaviour.

Security:
- compile::source_path_guard::validate_workflow_source_path: the
  ParentDir + `~` check was previously gated behind the relative-
  path branch, so an adversarial absolute path like
  `/workspace/../../home/runner/.env.md` slipped through unchecked
  even though the module's stated contract refuses parent-
  directory traversal unconditionally. Hoisted the check before
  the absolute/relative split; updated the error message to
  drop "relative" since the rejection now applies to both kinds.
  New `rejects_absolute_path_with_parent_dir_component`
  regression test; existing tests + mcp_author tests updated to
  match the new wording.

Suggestions:
- inspect::lint::report: renamed the inner `LintSummary` binding
  from `summary` to `tally` so it no longer shadows the
  `PipelineSummary` parameter; the struct field on `LintReport`
  is still `summary` (no public-API change).
- inspect::whatif::closest and inspect::graph_deps::closest:
  added a Levenshtein distance threshold (`needle_len / 2 + 2`)
  so unrelated input like `whatif --fail xyzzy` no longer gets a
  noisy "did you mean" suggestion against whatever IR id happens
  to be lexicographically nearest. Single-typo cases like `Aget`
  → `Agent` still suggest. Two whatif regression tests
  (`closest_returns_none_for_unrelated_input`,
  `closest_suggests_single_typo_within_threshold`).

Validated locally: cargo build, cargo test (1908 unit tests + every
integration suite pass), cargo clippy --all-targets --all-features
(clean).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions

Copy link
Copy Markdown
Contributor

🔍 Rust PR Review

Summary: Looks good overall — solid architecture and excellent security work. One reproducible bug in the cache-hit path, one doc comment that undersells a behavioral difference.


Findings

🐛 Bugs / Logic Issues

  • src/audit/cli.rs — warning accumulation on repeated cache-hit runs

    derive_post_processing appends to audit.warnings via record_warning inside populate_pipeline_graph. On a cache-hit run, audit is loaded from the on-disk snapshot (which already contains any warnings from the previous run). If populate_pipeline_graph adds another warning (e.g., "source markdown not available locally"), the equality check audit != cached_audit_before_postprocess is true and the new audit — now containing one extra duplicate warning — is saved back. The next cache-hit run repeats: +1 warning, save, +1 warning, save...

    Minimal repro: run ado-aw audit <build-id> twice against a build whose source markdown is not present locally. The second run's report will have two identical "skipping IR graph correlation" warnings; the third will have three; and so on.

    Suggested fix — clear pipeline_graph-sourced warnings before re-running in derive_post_processing:

    async fn derive_post_processing(audit: &mut AuditData, run_dir: &Path) {
        // Drop stale pipeline-graph warnings before re-running correlation
        // so that repeated cache-hit runs don't accumulate duplicates.
        audit.warnings.retain(|w| w.source != "audit::pipeline_graph");
        if let Err(error) = pipeline_graph::populate_pipeline_graph(audit, run_dir).await { ... }
        ...
    }

⚠️ Suggestions

  • src/compile/mod.rs:build_pipeline_ir — doc comment overstates alignment with check_pipeline

    The comment says "Codemods are applied in memory only, matching check_pipeline's behavior". Codemods are applied (via parse_markdown_detailed) — that part is correct. But check_pipeline fails loudly when any codemod fires (parsed.codemods.changed()bail!), while build_pipeline_ir intentionally tolerates un-migrated pipelines. The "matching ... behavior" phrasing suggests the two are equivalent; they're not, and the difference is meaningful (inspect/graph/lint deliberately work on un-migrated sources without requiring a compile first). Worth making explicit:

    // Like `check_pipeline`, codemods are applied in memory and never written
    // back to disk. Unlike `check_pipeline`, a pending migration is tolerated
    // here — inspect/graph/lint should be usable on un-migrated sources.
    

✅ What Looks Good

  • source_path_guard.rs — moving the ParentDir + ~ check before the is_absolute() split is the right fix; the old layout left /workspace/../../home/runner/.env.md unchecked. The canonicalize + symlink-target extension re-check on both the absolute and relative branches is thorough, and the separator-normalisation for ..\\ on Linux closes the bypass cleanly. The targeted regression tests for each vector are excellent.

  • whatif.rsis_negated_call panic safety — using byte-slice .get(idx-4..idx) instead of &str[idx-4..idx] is the correct fix for multi-byte characters adjacent to failed() (the é test in the suite is a good canary). Double-negation handling (not(not(failed()))RunsAnyway) is cleanly unit-tested.

  • classify_condition coverage limitations — the doc comment enumerating known false-negatives (not(succeeded()), scoped-predicate forms, canceled()-only, variable-based conditions, template expressions) is exactly the right level of documentation for a best-effort static analyser. Callers won't be surprised.

  • **Job(redacted) / timeline_name_matches_job centralization** — pulling the stage-qualified name matching into one place and guarding against multi-level dot paths (Stage1.SubStage.Agent) prevents the drift that previously existed between audit::findingsandinspect::trace`.

  • AuditBuildParams / TraceFailureParams custom Debug — redacting the PAT field so rmcp error traces never emit the token in plaintext is easy to overlook; glad it's here.

  • Unified cache root — routing all three entry points (CLI audit, mcp-author audit_build, trace) through default_cache_root()\$\{TEMP}/ado-aw/audit prevents ./logs/ directories from appearing under arbitrary IDE working directories. Clean.

Generated by Rust PR Reviewer for issue #998 · 923.9 AIC · ⌖ 12.4 AIC · ⊞ 32.9K ·

@jamesadevine jamesadevine merged commit 64e59bc into main Jun 15, 2026
8 checks passed
@jamesadevine jamesadevine deleted the feat/agent-ir-tools branch June 15, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants