Skip to content

docs(problems): add MCP configuration drift problem doc#2011

Open
Benkapner wants to merge 2 commits into
fullsend-ai:mainfrom
Benkapner:docs/mcp-config-drift
Open

docs(problems): add MCP configuration drift problem doc#2011
Benkapner wants to merge 2 commits into
fullsend-ai:mainfrom
Benkapner:docs/mcp-config-drift

Conversation

@Benkapner

@Benkapner Benkapner commented Jun 8, 2026

Copy link
Copy Markdown

MCP configuration files define what external tools and services an agent can access. They are the agent's permission surface: every tool server declared in the config becomes available to the agent at startup. Today, these configs are plain files in the repo or workspace, loaded at agent startup without any integrity check, and not monitored between runs.

This creates a security blind spot. The existing security hooks (Tirith, SSRF validator, canary detection, secret redactor) all operate at runtime, checking what the agent does after it starts. But nobody checks whether the configuration that defines the agent's entire tool surface was tampered with before the agent even started. It is like having a security guard at the door checking IDs, while someone quietly replaced the list of who is allowed in.

If an attacker (or a compromised agent, or even an unreviewed PR) modifies a .mcp.json file, they can:

  1. Inject a malicious MCP server that exposes attacker-controlled tools. The agent trusts these tools because they are declared in its config.
  2. Replace a legitimate server endpoint with an attacker-controlled proxy. All tool calls the agent makes through that server now pass through the attacker's infrastructure, enabling data interception and response manipulation.
  3. Expand the tool surface by adding capabilities to an existing server entry, giving the agent access to destructive operations or data sources it was never designed to use.
  4. Accumulate drift organically as teams add integrations without a baseline to compare against, violating least privilege without anyone noticing.

Existing defenses are insufficient for this specific threat:

  • CODEOWNERS can guard MCP config files, but many repos treat config files as low-sensitivity and do not require human approval
  • The tool allowlist hook operates on tool names, not server endpoints. Replacing the endpoint behind a trusted tool name bypasses the allowlist entirely
  • SSRF validation blocks connections to private networks, but a malicious external server URL passes all checks
  • Credential isolation (ADR 0017) keeps secrets out of the sandbox, but MCP server endpoints are not secrets

This extends the "persistent injection via externally editable resources" concern already identified under Threat 1 in the security threat model, applying it specifically to MCP configurations.

The doc proposes three defense approaches with trade-offs:

  • Baseline-and-diff: hash config files at session start, compare to a stored baseline, alert or block on mismatch. Simple to implement but requires a workflow for legitimate config updates.
  • Immutable harness input: treat MCP configs as harness-level inputs injected from a trusted source (like agent system prompts), so the agent never sees the config file. Strongest isolation but adds operational complexity.
  • Content-aware validation: parse the config and validate contents against a policy (approved server domains, approved tool surfaces per agent role). Catches semantic threats that hashing misses but requires maintaining allowlists.

Describes the threat of silent MCP config modification as an
escalation vector, with approaches for baseline-and-diff,
immutable harness input, and content-aware validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Benjamin Kapner <bkapner@redhat.com>
@fullsend-ai-review

fullsend-ai-review Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review

Findings

Low

  • [duplication] docs/problems/mcp-config-drift.md:9 — The "Related:" block (lines 5–8) is duplicated verbatim at lines 9–13. This is a copy-paste error. The duplicate block contains the same three references (security-threat-model.md, agent-architecture.md, ADR 0017) and should be removed.
    Remediation: Remove the second "Related:" block (lines 9–13).

Info

  • [missing-authorization] docs/problems/mcp-config-drift.md — This PR adds a new problem document with no linked issue. CLAUDE.md provides the process for adding problem docs ("create a new file in docs/problems/ and link it from README.md") without requiring issue-based authorization, so this is noted for traceability rather than as a blocking concern.

  • [scope-boundary] docs/problems/mcp-config-drift.md — MCP configuration drift could potentially be a subsection of security-threat-model.md rather than a standalone problem doc. The doc's "Relationship to other problem areas" section addresses this by explaining how MCP config drift is a specific instance of Threat 1 with unique defense considerations (tool surface definition vs. text-based influence). The three proposed defense approaches are MCP-specific and justify standalone treatment.

Previous run

Review

Findings

Medium

  • [missing-doc] README.md — The new problem document docs/problems/mcp-config-drift.md is not linked from README.md. CLAUDE.md requires: "When adding new problem areas, create a new file in docs/problems/ and link it from README.md." The README lists all 23 existing problem docs (lines 17–41) but the new MCP Configuration Drift doc is absent. Add an entry following the existing format, e.g.: - [MCP Configuration Drift](docs/problems/mcp-config-drift.md) — Detecting unauthorized changes in MCP server configurations that define the agent tool surface

Low

  • [edge-case] docs/problems/mcp-config-drift.md:50 — Approach 1 (baseline-and-diff) does not address the trust-on-first-use (TOFU) bootstrapping problem: if the first run occurs against an already-compromised config, the baseline captures the malicious state and all subsequent runs pass. Consider noting in the trade-offs that the baseline should be established from a known-good state or reviewed by a human before being trusted.

  • [technical-accuracy] docs/problems/mcp-config-drift.md:79 — The "Relationship to existing security hooks" section claims SSRF validation is "the last line of defense if a malicious endpoint makes it into the config." However, the SSRF pretool hook (ssrf_pretool.py) operates on Bash and WebFetch tool calls. MCP server connections are established by the runtime's MCP client, which may not flow through the tool-call hook mechanism. If MCP connections bypass the SSRF hook, the "last line of defense" characterization overstates the actual coverage.

  • [section-structure] docs/problems/mcp-config-drift.md — Missing "Relationship to other problem areas" section. This section appears in most existing problem docs (4 of 5 reviewed) and provides explicit cross-references to related concerns. For this doc, relevant cross-references would include Security Threat Model (MCP drift as a specific instance of Threat 1), Governance (who controls MCP config policy), and Agent Architecture (how MCP configs relate to agent roles and trust boundaries).

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label Jun 8, 2026
Add README.md entry. Add TOFU bootstrapping risk to baseline-and-diff
trade-offs. Correct SSRF coverage characterization for MCP connections.
Add cross-references to Security Threat Model, Governance, and Agent
Architecture.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Benjamin Kapner <bkapner@redhat.com>
@Benkapner

Copy link
Copy Markdown
Author

Addressed all findings in bb260f0.

[missing-doc] Added a bullet point entry in README.md linking to docs/problems/mcp-config-drift.md, positioned after the Security Threat Model entry.

[edge-case] Added a TOFU bootstrapping risk note to the Approach 1 trade-offs: "if the first run occurs against an already-compromised config, the baseline captures the malicious state and all subsequent runs pass. The baseline should be established from a known-good state or reviewed by a human before being trusted."

[technical-accuracy] Corrected the SSRF coverage characterization. The SSRF pretool hook operates on Bash and WebFetch tool calls, but MCP server connections are established by the runtime's MCP client, which may not flow through the tool-call hook mechanism. Replaced "last line of defense" with a more accurate description of partial coverage, and repositioned drift detection as the primary defense for MCP specifically.

[section-structure] Added a "Relationship to other problem areas" section with cross-references to Security Threat Model (MCP drift as a specific instance of Threat 1), Governance (who controls MCP config policy), and Agent Architecture (MCP configs define agent role boundaries).

Comment thread docs/problems/mcp-config-drift.md
@fullsend-ai-review fullsend-ai-review Bot added ready-for-merge All reviewers approved — ready to merge and removed requires-manual-review Review requires human judgment labels Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-merge All reviewers approved — ready to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant