Skip to content

feat(install): add --mcp flag for declaratively adding MCP servers to apm.yml#810

Merged
danielmeppiel merged 19 commits intomainfrom
feat/install-mcp-flag
Apr 21, 2026
Merged

feat(install): add --mcp flag for declaratively adding MCP servers to apm.yml#810
danielmeppiel merged 19 commits intomainfrom
feat/install-mcp-flag

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

@danielmeppiel danielmeppiel commented Apr 21, 2026

Summary

Adds apm install --mcp NAME ... and apm mcp install NAME ... for declaratively adding MCP servers to apm.yml. This closes the gap surfaced in #122 (lirantal feedback) and resolves #807.

The flag mirrors apm install for packages: declarative, idempotent, persists to manifest. Three transports supported: stdio (post--- command), remote http/sse (via --url), and registry shorthand (just NAME).

The PR also lands breaking security hardening to MCPDependency.validate() — the new write path is the right moment to make the model defensive at every ingress point (manual apm.yml edits, transitive deps, plugin-synthesized MCPs, this new CLI).

Examples

# Self-defined stdio MCP (the lirantal use case)
apm install --mcp fetch -- npx -y @modelcontextprotocol/server-fetch

# Remote http transport
apm install --mcp api --transport http --url https://example.com/mcp \
  --header "Authorization=Bearer xyz"

# Registry shorthand with version pin
apm install --mcp microsoft/azure-devops-mcp --mcp-version 1.2.0

# With env vars (stdio only)
apm install --mcp gh -- gh-mcp --env GITHUB_TOKEN=$GH_PAT --env DEBUG=1

# Discoverability alias
apm mcp install fetch -- npx -y @modelcontextprotocol/server-fetch

Changes

Added (#807)

  • apm install --mcp NAME [--transport ...] [--url ...] [--env K=V] [--header K=V] [--mcp-version V] [-- COMMAND ARGS...]
  • apm mcp install thin alias forwarding to the above
  • 6 new flags: --mcp, --transport, --url, --env, --header, --mcp-version
  • Idempotency: TTY prompts with diff; non-TTY requires --force; --force replaces silently
  • 14-case conflict matrix (E1–E14) covering positional+--mcp, --global+MCP, --header without --url, etc.

Changed (BREAKING) — security hardening

  • MCP name must match ^[a-zA-Z0-9@][a-zA-Z0-9._@/:=-]{0,127}$ (rejects null bytes, newlines, leading punctuation other than @, length>128)
  • MCP URL scheme restricted to http / https (was previously unchecked)
  • MCP headers reject \r / \n in keys and values (CRLF-injection prevention)
  • Self-defined stdio command rejected if it contains path-traversal sequences (..)

These checks now run on every ingress through MCPDependency.from_string() and from_dict(), not just the new CLI path. Most existing apm.yml files unaffected; if you hit Invalid MCP name, rename per the regex.

Soft warnings (warn, not block)

  • SSRF check: warns when --url host is loopback / link-local / RFC1918 / cloud-metadata
  • Shell-metachar warn: when --env values contain $(), backticks, ;, &&, etc. (reminder these are not shell-evaluated)

Architecture

This was developed via the APM Review Panel workflow:

  • Wave 0 (research): 4 parallel sub-agents (devx-ux, python-architect, cli-logging, supply-chain-security) produced the spec
  • Wave A checkpoint: panel rubber-duck flagged 4 must-fixes (universal validate path, expanded conflict matrix, idempotency policy, header-key CRLF) — all incorporated
  • Wave B (implementation): 3 sub-agents in two phases (validate + alias parallel; install consolidated) to avoid install.py merge contention

3 atomic commits, one per concern:

  • a65d7e2apm mcp install alias (commands/mcp.py)
  • 12f6bc7 — model hardening (models/dependency/mcp.py + 36 tests)
  • 7904d2a--mcp CLI surface (commands/install.py + 55 tests)

Tests

  • +92 new tests across 5 test files
  • tests/unit/test_build_mcp_entry.py (21) — pure builder
  • tests/unit/test_add_mcp_to_apm_yml.py (13) — writer + idempotency
  • tests/unit/test_install_command.py (+21) — Click integration + conflict matrix
  • tests/unit/test_mcp_command.py (+6) — alias forwarding
  • tests/unit/test_mcp_overlays.py (+36) — model hardening + compatibility

Full unit suite: 4617 passed, 1 unrelated pre-existing failure (test_user_scope_skips_workspace_runtimes, verified on main).

Stacking note

Builds on PR #809 conceptually (W4 shell-string rejection) — the validator chokepoint refactor in this PR makes that check apply universally too. Independent merge order; either can land first.

Closes #807, #813, #814
Refs #122 #806 #816

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com


Docs (folded in from #811)

This PR now also ships the MCP documentation tranche (closes #808). Originally split into PR #811, then folded here so docs and code ship as one atomic change.

New guide -- docs/src/content/docs/guides/mcp-servers.md (~190 lines, 9 sections):

  • Quick Start leads with apm install --mcp stdio shape (one runnable line)
  • Three install shapes: stdio (canonical), HTTP/SSE, registry (io.github.<owner>/<id>)
  • Flag reference table + validation rules + error-code table (E1-E14)
  • Anti-pattern callouts
  • Custom registry (enterprise) -- documents MCP_REGISTRY_URL env var (default https://api.mcp.github.com), scope-widened to cover all apm mcp commands once bug: apm mcp search/list/show ignore MCP_REGISTRY_URL env var #813 landed

Wiring:

  • astro.config.mjs -- sidebar entry after Plugins
  • reference/cli-commands.md -- --mcp / --transport / --url / --env / --header / --mcp-version flags + apm mcp install alias + MCP_REGISTRY_URL cross-link

Audit drift fixes (in scope):

  • integrations/ide-tool-integration.md -- apm mcp info -> apm mcp show; emoji table -> ASCII Yes/No
  • guides/prompts.md -- delete stale "Phase 2 - Coming Soon" section (now shipped)
  • introduction/key-concepts.md -- ghcr.io/... -> canonical io.github.<owner>/<id>

Cross-links -- short admonitions in quick-start.md, dependencies.md, prompts.md.

Skill mirror (per repo doc-sync rule):

  • packages/apm-guide/.apm/skills/apm-usage/commands.md -- --mcp flag family + MCP_REGISTRY_URL
  • packages/apm-guide/.apm/skills/apm-usage/dependencies.md -- See-also link

Bonus bug uncovered, then fixed in this PR -- doc-writer agent found apm mcp search/list/show hardcoded the public registry URL, making discovery broken for enterprise private registries. Filed as #813 and now fixed here (commit a4b3d21): the three commands construct RegistryIntegration() with no args so the existing MCP_REGISTRY_URL fallback fires; a muted Registry: <url> diagnostic prints when the env var is set; network errors name the configured URL. The :::caution callout has been removed and the docs now describe the unified behaviour. Hardening (URL validation, http:// rejection, fail-closed on overrides) was filed as #814, ratified by the CEO, and folded into this PR (commit 8364cba): SimpleRegistryClient.__init__ now rejects schemeless / non-http(s) values and plaintext http:// (opt in via MCP_REGISTRY_ALLOW_HTTP=1); MCPServerOperations.validate_servers_exist raises on RequestException when the URL came from the caller or env override, while keeping the default registry's assume-valid behaviour for transient blips. +14 tests; full unit suite at 4637 passing.

Docs validation:

  • npm run build (docs/): PASS, 42 pages, 9.93s
  • starlight-links-validator: all internal links resolve
  • ASCII check on new guide: 0 non-ASCII bytes

Tranche status (issue #122 followup)

# Workstream PR Status
W4 Validation: warn on shell-string MCP command #809 Shipped
W3 + W2 CLI apm install --mcp + consolidated MCP docs this PR In review
-- bug: apm mcp search/list/show ignore MCP_REGISTRY_URL this PR (commit a4b3d21) Fixed -- closes #813
-- hardening: URL validation + http:// rejection + fail-closed on overrides #814 Fixed -- closes #814

danielmeppiel and others added 3 commits April 21, 2026 11:02
Adds a thin alias subcommand under the 'mcp' command group that forwards
all arguments to 'apm install --mcp ...'. Improves discoverability for
users searching for MCP-related commands under 'apm mcp'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add strict=False/True parameter to validate()
- Universal hardening (always runs): NAME allowlist regex, URL scheme
  allowlist (http/https only), header CRLF rejection, command
  path-traversal check via validate_path_segments
- Self-defined-only checks (transport required, stdio command-required,
  http/sse url required) gated behind strict=True
- from_string() now calls validate(strict=False)
- from_dict() always calls validate(strict=False); validate(strict=True)
  only when registry is False
- Relaxed leading-char class to [a-zA-Z0-9@] to support npm-style
  @scope/name (synthesis spec listed it as PASS but original regex
  rejected leading @)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… apm.yml (#807)

Implements W3 Wave B T-install. Adds 'apm install --mcp NAME ...' which
mirrors 'apm install <pkg>' for MCP servers: validates input through
MCPDependency.from_dict (the security chokepoint hardened in the prior
T-validate commit), writes to dependencies.mcp (or devDependencies.mcp
under --dev), and integrates the new server into client adapters.

New Click options on 'apm install':
  --mcp NAME, --transport, --url, --env (repeatable), --header
  (repeatable), --mcp-version

Argv handling: Click's nargs=-1 silently swallows the '--' separator,
so we inspect sys.argv before Click parses to recover the post-'--'
stdio command argv. Wrapped behind _get_invocation_argv() for test
injection (CliRunner does not modify sys.argv).

Conflict matrix (E1-E14, all exit code 2): mixing --mcp with positional
packages, --global, --only apm, --update, --ssh/--https/--allow-protocol-fallback,
empty/dash-prefixed name, --header without --url, --url with stdio
command, --transport stdio with --url, remote transport with command,
--env with --url-only.

Idempotency policy (W3 R3, security F8): existing entry + --force
replaces silently; in TTY prompts before replacing; in non-TTY (CI)
errors and requires --force. Identical entries are skipped.

Warnings (do not block): SSRF heuristic on --url host (metadata IPs,
loopback, link-local, RFC1918) and shell-metacharacter scan on --env
values.

Tests:
- tests/unit/test_build_mcp_entry.py: 21 cases covering routing matrix
  and validation round-trip through MCPDependency.
- tests/unit/test_add_mcp_to_apm_yml.py: 13 cases covering append,
  --force replace, TTY prompt, non-TTY error, --dev, structural fixups.
- tests/unit/test_install_command.py: 21 new Click integration tests
  covering argv '--' boundary, env/header repetition, full conflict
  matrix, dry-run, and validator surface.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a declarative CLI workflow for adding MCP servers to apm.yml (via apm install --mcp ... and an apm mcp install alias) while hardening MCP dependency validation at all ingress points.

Changes:

  • Introduces --mcp (and related flags) on apm install, including -- argv-boundary parsing, manifest write/idempotency logic, and MCP integration.
  • Adds apm mcp install as a thin forwarding alias to apm install --mcp ....
  • Hardens MCPDependency validation (name allowlist regex, URL scheme allowlist, header CRLF rejection, stdio command traversal rejection) and adds extensive unit coverage.
Show a summary per file
File Description
src/apm_cli/commands/install.py Adds --mcp CLI surface, conflict matrix, argv -- boundary parsing, manifest writer, and install/integration path.
src/apm_cli/commands/mcp.py Adds apm mcp install alias forwarding to root install command.
src/apm_cli/models/dependency/mcp.py Adds universal validation hardening and strictness gating for self-defined vs registry-resolved entries.
tests/unit/test_install_command.py Click-level tests for apm install --mcp behavior and conflict matrix.
tests/unit/test_mcp_command.py Tests for alias forwarding semantics and error propagation.
tests/unit/test_mcp_overlays.py Tests for universal validation hardening and strict/non-strict gating.
tests/unit/test_build_mcp_entry.py Unit tests for the pure MCP entry builder and validation round-trips.
tests/unit/test_add_mcp_to_apm_yml.py Unit tests for manifest writing + idempotency behavior.
CHANGELOG.md Adds BREAKING validation note and a new Added entry for --mcp.

Copilot's findings

  • Files reviewed: 9/9 changed files
  • Comments generated: 5

Comment thread src/apm_cli/models/dependency/mcp.py Outdated
scheme = urlparse(self.url).scheme.lower()
if scheme not in _ALLOWED_URL_SCHEMES:
raise ValueError(
f"Invalid --url '{self.url}': scheme must be http or https"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MCPDependency.validate() raises an error that references the CLI flag name (Invalid --url ...). Since this validator is also used when parsing apm.yml/lockfiles and other non-CLI ingress points, the message should be field-centric (e.g., "Invalid MCP url" / "Invalid 'url' field") rather than CLI-specific, so users editing manifests don't get a confusing --url hint.

Suggested change
f"Invalid --url '{self.url}': scheme must be http or https"
f"Invalid MCP url '{self.url}': scheme must be http or https"

Copilot uses AI. Check for mistakes.
Comment thread src/apm_cli/commands/install.py Outdated
Comment on lines +947 to +949
type=click.Choice(["stdio", "http", "sse"]),
default=None,
help="MCP transport (stdio, http, sse). Inferred from --url or post-`--` command when omitted.",
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--transport is restricted to {stdio,http,sse} via Click, but the MCP model/integrators support streamable-http and the conflict matrix already references it. This makes streamable-http effectively unsupported from the new CLI surface. Either add streamable-http to the Click choice list (and help text), or remove the dead streamable-http handling from the new --mcp path to keep behavior consistent.

Suggested change
type=click.Choice(["stdio", "http", "sse"]),
default=None,
help="MCP transport (stdio, http, sse). Inferred from --url or post-`--` command when omitted.",
type=click.Choice(["stdio", "http", "sse", "streamable-http"]),
default=None,
help="MCP transport (stdio, http, sse, streamable-http). Inferred from --url or post-`--` command when omitted.",

Copilot uses AI. Check for mistakes.
Comment thread src/apm_cli/commands/install.py Outdated
# E2: --global not supported for MCP entries.
if global_:
raise click.UsageError(
"MCP servers are workspace-scoped; --global not supported"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --global conflict error for --mcp currently states "MCP servers are workspace-scoped", but elsewhere in install (and the --global option help) MCP is described as supporting user scope installs. Consider rewording this error to the narrower truth (e.g., adding via --mcp is project-scoped / cannot write to user-scope manifest) to avoid implying MCP itself can never be global.

Suggested change
"MCP servers are workspace-scoped; --global not supported"
"cannot use --global with --mcp; MCP entries added via --mcp are project-scoped"

Copilot uses AI. Check for mistakes.
Comment on lines +939 to +976
@click.option(
"--mcp",
"mcp_name",
default=None,
help="Add an MCP server entry to apm.yml. Use with --transport, --url, --env, --header, --mcp-version, or post-`--` stdio command.",
)
@click.option(
"--transport",
type=click.Choice(["stdio", "http", "sse"]),
default=None,
help="MCP transport (stdio, http, sse). Inferred from --url or post-`--` command when omitted.",
)
@click.option(
"--url",
"url",
default=None,
help="MCP server URL (for http/sse transports).",
)
@click.option(
"--env",
"env_pairs",
multiple=True,
metavar="KEY=VALUE",
help="Environment variable for stdio MCP (repeatable).",
)
@click.option(
"--header",
"header_pairs",
multiple=True,
metavar="KEY=VALUE",
help="HTTP header for remote MCP (repeatable).",
)
@click.option(
"--mcp-version",
"mcp_version",
default=None,
help="Pin MCP registry entry to a specific version.",
)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds a new CLI surface (apm install --mcp / apm mcp install), but there are no corresponding doc updates under docs/src/content/docs/ (e.g., reference CLI docs) and no apm-usage skill updates under packages/apm-guide/.apm/skills/apm-usage/. Please add concise docs for the new flags and the alias so users can discover the feature outside of --help.

Copilot uses AI. Check for mistakes.
Comment thread CHANGELOG.md Outdated

### Added

- `apm install --mcp NAME [--transport ...] [--url ...] [--env K=V] [--header K=V] [--mcp-version V] [-- COMMAND ARGS...]` flag for declaratively adding MCP servers to `apm.yml`. Mirrors `apm install` for packages: validates input through `MCPDependency`, writes to `dependencies.mcp` (or `devDependencies.mcp` with `--dev`), and integrates the new server into client adapters. Idempotency policy: in a TTY, prompts before replacing an existing entry; in CI, requires `--force`. Also accessible via `apm mcp install` alias for discoverability. Closes #807
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog entry format drift: this new Added entry does not end with the PR number in parentheses, and it includes "Closes #807" instead. Per the repo changelog rules, each entry should be a single concise line ending with (#807) (and avoid extra "Closes" wording inside the bullet).

Suggested change
- `apm install --mcp NAME [--transport ...] [--url ...] [--env K=V] [--header K=V] [--mcp-version V] [-- COMMAND ARGS...]` flag for declaratively adding MCP servers to `apm.yml`. Mirrors `apm install` for packages: validates input through `MCPDependency`, writes to `dependencies.mcp` (or `devDependencies.mcp` with `--dev`), and integrates the new server into client adapters. Idempotency policy: in a TTY, prompts before replacing an existing entry; in CI, requires `--force`. Also accessible via `apm mcp install` alias for discoverability. Closes #807
- `apm install --mcp NAME [--transport ...] [--url ...] [--env K=V] [--header K=V] [--mcp-version V] [-- COMMAND ARGS...]` flag for declaratively adding MCP servers to `apm.yml`. Mirrors `apm install` for packages: validates input through `MCPDependency`, writes to `dependencies.mcp` (or `devDependencies.mcp` with `--dev`), and integrates the new server into client adapters. Idempotency policy: in a TTY, prompts before replacing an existing entry; in CI, requires `--force`. Also accessible via `apm mcp install` alias for discoverability. (#807)

Copilot uses AI. Check for mistakes.
danielmeppiel and others added 2 commits April 21, 2026 12:31
…low (#808)

Closes #808.

- New guides/mcp-servers.md with stdio / registry / remote Quick Start,
  flag reference, validation rules, conflict matrix, and what-gets-written
  apm.yml results. Sidebar entry added.
- reference/cli-commands.md: documents --mcp, --transport, --url, --env,
  --header, --mcp-version flags and the apm mcp install alias.
- packages/apm-guide/.apm/skills/apm-usage/{commands,dependencies}.md
  mirrored per the in-repo skill-sync rule.
- Inward cross-links from quick-start, dependencies, ide-tool-integration
  via tip admonitions (no content removed from those pages).

Drift fixes in the same PR:
- guides/prompts.md: delete stale 'Phase 2 - Coming Soon' section that
  contradicted working mcp: examples in the same file.
- integrations/ide-tool-integration.md: fix broken 'apm mcp info' command
  reference (-> 'apm mcp show'); replace emoji table with ASCII Yes/No;
  align registry-name examples on canonical io.github.github/... form.
- introduction/key-concepts.md: align ghcr.io/... example on
  io.github.github/... form.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds an authoritative subsection in the MCP Servers guide describing
MCP_REGISTRY_URL (default https://api.mcp.github.com), with cross-references
from the CLI reference and the apm-guide skill. Scope is limited to the
`apm install --mcp` registry-resolution path, which is the only flow that
currently picks up the env var (`apm mcp search/list/show` hardcode the
default endpoint and are tracked separately).

Refs #811

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel added the documentation Improvements or additions to documentation label Apr 21, 2026
Copy link
Copy Markdown
Collaborator

@sergio-sisternes-epam sergio-sisternes-epam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP

@sergio-sisternes-epam sergio-sisternes-epam dismissed their stale review April 21, 2026 10:42

WIP - review submitted prematurely, disregard approval. Will re-review with author.

…try failures (#813)

The discovery commands (apm mcp search/list/show) hardcoded
'https://api.mcp.github.com', so MCP_REGISTRY_URL only worked for
'apm install --mcp'. Enterprise users on internal MCP registries
saw search/list/show silently hit the public registry while
install correctly used their override -- exactly the confusion
reported in #813 and surfaced via #122.

Fix:
- Drop hardcoded URLs from the three call sites in commands/mcp.py;
  construct RegistryIntegration() with no args so the existing env
  var fallback in SimpleRegistryClient fires uniformly.
- Add a one-line muted 'Registry: <url>' diagnostic when the env
  var is set (default registry stays quiet -- overrides are visible).
- Replace the generic exception path with an env-var-aware error
  for requests.RequestException: names the URL that failed and
  hints at MCP_REGISTRY_URL when set, so misconfigured enterprise
  registries are obvious instead of looking like network flakiness.

Docs:
- mcp-servers.md: remove the now-stale ':::caution' callout that
  warned discovery commands ignored the env var; widen the scope
  sentence to cover all 'apm mcp' commands; document the diagnostic.
- cli-commands.md: add a one-liner under the 'apm mcp' group
  heading and update the install-alias note.
- packages/apm-guide/.apm/skills/apm-usage/commands.md: same
  scope-widening so the in-tool guide matches the user docs.

Tests:
- TestMcpRegistryEnvVar in tests/unit/test_mcp_command.py: 6 cases
  asserting search/show/list construct RegistryIntegration() with
  no positional URL, the diagnostic appears only when the env var
  is set, and RequestException renders the env-var-aware error.

Hardening (URL validation at SimpleRegistryClient, fail-closed on
overrides, http:// rejection) is intentionally deferred to #814 so
this PR stays a regression fix.

Closes #813.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread tests/unit/test_mcp_command.py Fixed
Comment thread tests/unit/test_mcp_command.py Fixed
danielmeppiel and others added 2 commits April 21, 2026 12:56
PR #810 makes MCP servers declarative in apm.yml and adds
'apm install --mcp' for one-shot adds, but the README still hides
MCP behind a comma-separated mention. Surgical edits to make the
killer use case (declare once, deploy across Copilot/Claude/Cursor/
Codex/OpenCode) visible at the three highest-traffic positions:

- apm.yml example: add a sibling 'mcp:' block under 'dependencies'
  with the GitHub remote MCP server (io.github.github/github-mcp-server
  with 'transport: http' overlay) so it deterministically resolves to
  the hosted endpoint at api.githubcopilot.com/mcp/. Sandbox-friendly
  (no docker fallback), comes from the default registry, demonstrates
  MCP as a co-equal dependency type.
- Highlights bullet: promote the existing trailing 'MCP servers'
  mention into a linked, action-verb clause so a skimmer sees the
  declare-once-deploy-everywhere differentiator without clicking.
- Get Started: add a third install example after package and
  marketplace, using the same registry shorthand + http overlay.
  One copy-pasteable line, parenthetical names the five clients.

No restructuring; ~10 lines of net additions across the three
spots. ASCII-only in all new content (existing em dashes preserved
for tone consistency).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…#814)

Hardens MCP_REGISTRY_URL handling at the registry client layer per
the supply-chain security review of PR #810. Two behaviour changes:

S1 -- URL validation at SimpleRegistryClient construction:
- Schemeless values (e.g. 'mcp.example.com') are rejected with an
  actionable error naming MCP_REGISTRY_URL.
- Unsupported schemes (anything other than http/https) are rejected.
- Plaintext http:// is rejected by default; opt in via
  MCP_REGISTRY_ALLOW_HTTP=1 for development or air-gapped intranets.
- Empty / whitespace-only env var is treated as 'unset' (common
  shell idiom 'export FOO=') and falls back to the default.
- Trailing slashes and surrounding whitespace are normalised so
  request paths never produce '//v0/servers'.

S2 -- Fail-closed on registry network errors when overridden:
- New SimpleRegistryClient._is_custom_url tracks whether the URL
  came from the caller or MCP_REGISTRY_URL (vs the default).
- MCPServerOperations.validate_servers_exist now raises RuntimeError
  on requests.RequestException when _is_custom_url is True. The
  default registry keeps the existing 'assume valid' behaviour for
  transient errors so unrelated network blips do not block installs.
- Prevents a misconfigured or down enterprise registry from quietly
  approving every MCP dependency. Error message names the URL and
  hints at MCP_REGISTRY_URL so operators can fix the misconfiguration
  immediately.

Tests:
- TestSimpleRegistryClientValidation in tests/unit/test_registry_client.py
  (11 cases covering all reject / accept paths + env var edge cases).
- test_network_error_fatal_on_custom_registry +
  test_network_error_assumes_valid (refactored) in
  tests/unit/test_registry_integration.py.
- _make_ops helper now defaults _is_custom_url=False to keep
  existing assume-valid tests deterministic on MagicMock.
- Full unit suite: 4637 passed (was 4623; +14 new).

Docs:
- mcp-servers.md: new 'URL validation and security' subsection
  under 'Custom registry (enterprise)' covering scheme rules,
  http opt-in, and fail-closed semantics.
- cli-commands.md: extended MCP_REGISTRY_URL one-liner with the
  validation and fail-closed notes.
- apm-usage/commands.md: same scope-widening so the in-tool guide
  matches the user docs.

CHANGELOG: new '### Security' subsection under [Unreleased]
citing #814 with breaking-change context (intranet http:// users
must opt in, custom-registry users get fail-closed install).

Closes #814.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sergio-sisternes-epam
Copy link
Copy Markdown
Collaborator

Great work on this PR — the conflict matrix, the -- argv boundary parsing, and the 92-test suite are all solid. I ran through the full UX flow and have a few findings grouped by severity.

Blocking concerns

1. ./server commands rejected at parse time

validate_path_segments() rejects . as a traversal segment, which breaks ./bin/my-server — a standard Unix pattern for local stdio servers. Worse, this validation runs during apm.yml parsing (MCPDependency.from_dict()), so any existing project with command: ./my-server gets a hard crash on apm install:

[x] Failed to parse apm.yml: Invalid MCP dependency: Invalid MCP command
    './bin/my-server': segment '.' is a traversal sequence

.. should absolutely be rejected, but . (current directory) is legitimate and should be allowed.

2. No migration path for existing manifests

Because validation now runs universally on all from_dict() / from_string() calls, any non-conforming entry in an existing apm.yml becomes a hard failure on upgrade — not a warning, not a deprecation notice, just a crash. This affects:

  • ./server commands (see above)
  • _-prefixed names (e.g., _internal-api)
  • ws:// / wss:// URLs

A warning-on-first-encounter + grace period (or --strict opt-in) would make this much safer for existing users.

Bugs

3. apm mcp install alias broken with post--- args

$ apm mcp install fetch --dry-run -- npx -y @mcp/server-fetch
Error: No such option: -y

Click parses -y as its own flag instead of passing it through. The -- boundary detection works correctly for apm install --mcp, but the mcp install alias forwarding via cli.main(args=...) doesn't preserve the -- separator for Click's argument parsing.

Design questions

4. Multiple MCP registries + apm config support

The model already supports registry: "https://private.corp.example" per-entry, but the CLI only produces bare strings (default registry) or registry: false (self-defined). Enterprise teams with both public and private registries can't use --mcp for their private servers.

Two related gaps:

  • CLI flag: A --registry <url> flag on apm install --mcp would let users install from a specific registry in one command.
  • apm config set mcp-registry-url <url>: Currently MCP_REGISTRY_URL is env-var-only. Windows users don't use env vars as extensively as Unix users — supporting this via apm config would make it accessible cross-platform. The fallback chain would be: CLI flag > env var > apm config > default (https://api.mcp.github.com).

Both are probably v2, but worth tracking.

5. _-prefixed names — intentional?

The regex ^[a-zA-Z0-9@]... rejects names starting with _, -, or .. These are plausible for internal/private servers (e.g., _corp-analytics). Is the restriction intentional?

Minor / non-blocking

6. Error message wording: Invalid --url '...' in the model validation is flag-centric — won't make sense when the error comes from parsing an apm.yml url: field rather than from CLI input.

7. "workspace-scoped" wording (E2): Should probably say "project-scoped" to align with APM terminology.

8. CHANGELOG entries missing (#810) suffix per repo convention.

…URL scheme

UX + supply-chain-security panel review of PR #810 flagged that the
README example using 'transport: http' on a registry MCP entry reads
ambiguously to devs from npm/pip/cargo land -- it looks like an opt-in
to plaintext HTTP, when in reality 'http' is the MCP-spec transport
name and the wire is always HTTPS (verified end-to-end: the configured
URL is https://api.githubcopilot.com/mcp/).

Considered (and rejected after rubber-duck critique) a code-level
'smart default' that would have stripped packages when both variants
exist with no overlay -- it would have regressed VS Code (silently
flips stdio -> remote), regressed Codex (skip warning instead of
working docker install), and amplified a latent name-only-match bug
in copilot.py:_is_github_server. Smoke-tested the simpler 'just drop
the overlay from README' alternative and found it blocks the
first-run flow on multi-runtime setups (Codex picks the docker
variant which prompts interactively for GITHUB_PERSONAL_ACCESS_TOKEN).

Net: ship the docs-only disambiguation everyone agreed on. Surgical
inline clauses, no behavior change.

README:
- Inline clause on the apm.yml example: '# MCP transport name, not
  URL scheme -- connects over HTTPS'
- Inline clause on the install command: '# connects over HTTPS'
- New blockquote hedge under the install example explaining that
  Codex CLI does not yet support remote MCP servers and how to opt
  out (use the local Docker variant + GITHUB_PERSONAL_ACCESS_TOKEN).

docs/guides/mcp-servers.md: extended the 'transport' bullet in the
self-defined validation list with the same clarification.

docs/guides/dependencies.md: extended the 'transport' row in the
overlay-fields table.

docs/reference/manifest-schema.md: extended the 'transport' row in
the object-form table (sec 4.2.2).

packages/apm-guide/.apm/skills/apm-usage/dependencies.md: extended
the inline 'stdio|sse|http|streamable-http' code comment.

Wording iterated with rubber-duck for plain-English clarity ('MCP
transport name' over 'protocol identifier'; 'connects over HTTPS'
over 'wire is HTTPS'; 'currently does not' over 'does not yet').

Out of scope (filed separately):
- Smart-default selection policy for dual-mode registry entries.
- _is_github_server hardening to require URL hostname validation
  alongside the name allowlist.

Tests: full unit suite 4637 passed (no code change).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Folded the panel-agreed transport-clarity fix into this PR (commit 32d62b0).

Background: UX expert + supply-chain security expert flagged that transport: http reads ambiguously to devs from npm/pip/cargo land (looks like opt-in to plaintext HTTP; actually MCP-spec transport name; wire is HTTPS). I considered a code-level smart-default that would have let the README drop the overlay entirely. Rubber-duck blocked it (Codex regression, VS Code regression, latent _is_github_server name-only-match bug). Smoke-tested the no-overlay alternative and confirmed it blocks first-run on multi-runtime setups (Codex docker variant prompts for GITHUB_PERSONAL_ACCESS_TOKEN).

Net: docs-only disambiguation. Surgical inline clauses across README + 4 docs files (mcp-servers.md, dependencies.md, manifest-schema.md, apm-usage skill). Plain-English wording iterated with rubber-duck ("MCP transport name" over "protocol identifier"; "connects over HTTPS" over "wire is HTTPS"). Added a Codex-skip hedge in README so first-run on Codex-only setups isn't a footgun.

Out of scope, filed as #816 with security label: smart per-runtime sourcing rules so the overlay can eventually be dropped, plus _is_github_server host-validation hardening (name-only match currently allows GitHub-token injection at non-GitHub URLs from a poisoned registry entry -- latent today, would amplify if remote path becomes more common).

Tests: 4637 passed (no code change). Refs added #816 to PR body.

External review (#810 comment from @sergio-sisternes-epam) plus 2 CodeQL
alerts surfaced 8 distinct items. Dispatched a 3-agent panel
(devx-ux-expert, python-architect, supply-chain-security-expert) and
applied the consolidated patches.

Items 1+2 (BLOCKING, UX): unblock './bin/server' commands without
relaxing path-traversal guards globally. Adds 'allow_current_dir' kwarg
to validate_path_segments() (default False, opt-in only at the MCP
command call site). Rewrites three error messages to name the field,
the rule, and a concrete fix instead of leaking regex / flag syntax:

  - Invalid name -> 'Invalid MCP dependency name ... Fix: rename to ...'
  - Invalid url  -> 'Invalid MCP url ... use http:// or https://.
                     WebSocket URLs (ws/wss) are not supported ...'
  - Bad command  -> 'must not contain '..' path segments. Use an
                     absolute path or a command name on PATH instead.'

Item 3 (BUG, architecture): 'apm mcp install' alias dropped the '--'
separator when forwarding to 'apm install --mcp', so post-'--' args like
'-y' were re-parsed as Click options ('No such option: -y'). Fix
inspects sys.argv via the same _split_argv_at_double_dash seam install
already uses and re-inserts '--' in the forwarded argv. Two new tests
cover argv composition and the dry-run end-to-end path.

Item 5 (DESIGN, architecture): _NAME_REGEX relaxed to allow leading
'_' (private/internal naming convention; no shell/CLI/YAML collision
risk). Leading '-' and '.' stay rejected (argv flag confusion / dotfile
semantics). Docs explain each rule.

Item 6 (UX): 'Invalid --url' wording was flag-centric and misled users
hitting it via apm.yml parsing. Now field-name-agnostic ('Invalid MCP
url ...').

Item 7 (UX): 'workspace-scoped' -> 'project-scoped' across CLI, docs,
tests. APM's manifest is the project, not a VS Code workspace.

Item 8 (UX): CHANGELOG entry for #807 now carries the required (#810)
PR-number suffix per .github/instructions/changelog.instructions.md.

CodeQL (security): two test assertions in tests/unit/test_mcp_command.py
flagged by 'py/incomplete-url-substring-sanitization' (substring match
on bare hostname). Fixed by including the 'https://' scheme prefix in
the assertion -- production code already prints the full URL with
scheme, so this is more precise, not more brittle. Cross-cutting sweep
of registry/client.py, registry/operations.py, commands/mcp.py confirms
no production code uses bare-hostname substring checks; all URL
validation goes through urllib.parse.urlparse() per the established
pattern at registry/client.py:38-56.

Tests: 4645 passing in the unit suite (one pre-existing deselect).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

@sergio-sisternes-epam thanks for the thorough review — all 8 items from your comment plus 2 CodeQL alerts addressed in 872dad3.

Dispatched a 3-agent panel (UX, Python architect, supply-chain security) to scope-divide and fix:

Items 1+2 (blocking): validate_path_segments now takes an opt-in allow_current_dir=True kwarg used only at the MCP command call site — command: ./bin/my-server parses, .. still rejected. Other 15 call sites unchanged. The three error messages were rewritten to name the field, the rule, and a concrete fix instead of leaking regex / --url flag wording. New tests in test_path_security.py and test_mcp_overlays.py.

Item 3 (bug): apm mcp install -- npx -y … no longer fails with No such option: -y. The alias now inspects sys.argv via the same _split_argv_at_double_dash seam install uses and re-inserts -- in the forwarded argv. Two new tests cover argv composition and the dry-run path.

Item 5 (design): _NAME_REGEX relaxed to accept leading _ (no shell/CLI/YAML collision risk; matches npm/Python convention). Leading - and . stay rejected; docs explain each rule.

Item 6: Invalid --urlInvalid MCP url (field-name-agnostic, fires correctly from manifest parsing too).

Item 7: workspace-scopedproject-scoped across CLI, docs, tests.

Item 8: CHANGELOG entry for #807 now carries the required (#810) suffix.

CodeQL: two test assertions used substring matches on bare hostnames; now match against the full URL with https:// scheme prefix. Cross-cutting sweep of registry/client.py, registry/operations.py, commands/mcp.py confirmed no production code uses bare-hostname substrings — all URL validation goes through urllib.parse.urlparse().

Full unit suite: 4645 passing.

Comment thread tests/unit/test_mcp_command.py Fixed
Comment thread tests/unit/test_mcp_command.py Fixed
Adding 'https://' scheme prefix to substring assertions (commit 872dad3)
was insufficient -- CodeQL's py/incomplete-url-substring-sanitization
rule fires on the 'in' operator itself, not on the absence of a scheme.

Replace substring matches on the printed-console blob with structured
URL extraction:

  - new _printed_urls(printed) helper uses a regex to extract every
    https?://... token, parses each via urllib.parse.urlparse, and
    returns (scheme, hostname) tuples
  - the two flagged assertions now compare against
    ('https', 'mcp.internal.example.com') and
    ('https', 'busted.internal.example.com') respectively

This is what CodeQL's taint model recognises as a proper URL sanitiser
(urlparse is in the rule's allowlist). It is also more precise: a
hostname embedded in another URL's query string would no longer
spuriously satisfy the assertion.

Tests: 39/39 in test_mcp_command.py pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses item 4a from external review on PR #810
(#810 (comment)):
"a `--registry` flag for `apm install --mcp` to override the registry on
a per-install basis." Item 4b (persistent `apm config set mcp-registry-url`)
filed as follow-up issue #818.

Implementation reviewed by 4 expert lenses:

- **devx-ux**: Flag namespace mirrors npm-style `--registry`; help text
  states what it does, what it overrides, and the conflict semantics.
  Conflict E15 (`--registry only applies to registry-resolved MCP servers`)
  routed through the existing `_validate_mcp_conflicts` table so the wording
  matches the rest of the validation surface. Forwards transparently through
  the `apm mcp install` alias.
- **python-architect**: New `src/apm_cli/install/mcp_registry.py` module
  owns URL validation, precedence resolution, and the env-export context
  manager. Keeps `commands/install.py` under the 1500-LOC budget.
- **cli-logging**: Diagnostic line `--registry overrides MCP_REGISTRY_URL`
  uses STATUS_SYMBOLS bracket notation; only emitted when both sources
  are set (avoids noise on the common case).
- **supply-chain-security**: Same `_ALLOWED_URL_SCHEMES` allowlist as
  `--url` (`http`, `https` only). 2048-char cap, `urllib.parse.urlparse`
  for scheme/host extraction (CodeQL sanitizer allowlist). Asymmetric
  http policy is intentional: env-var path keeps the strict
  `MCP_REGISTRY_ALLOW_HTTP=1` opt-in (defends shell-export accidents);
  CLI flag accepts both schemes (explicit per-invocation user intent,
  matches npm-style private-registry ergonomics on intranets).

Behavior:

- Precedence: `--registry` > `MCP_REGISTRY_URL` > default
  (`https://api.mcp.github.com`).
- The flag captures the registry URL on the apm.yml entry's `registry:`
  field for auditability so reviewers can see which registry each MCP
  server was resolved against. (Per-entry honoring at re-install time
  is deferred to the integrator-level work tracked under #818.)
- Implementation pragmatism: `MCPIntegrator.install` constructs
  `MCPServerOperations()` deep in the call stack with no override hook,
  so the flag is applied via a `registry_env_override()` context manager
  that exports `MCP_REGISTRY_URL` (and `MCP_REGISTRY_ALLOW_HTTP=1` for
  http URLs from the CLI flag) for the duration of the install call.
  Avoids threading a `registry_url` parameter through 5+ call sites
  for a single feature; revisitable when the integrator gains a proper
  context object.

Tests: +19 (12 in test_install_command.py for flag wiring/validation/E15
conflict, 4 in test_build_mcp_entry.py for `registry:` overlay, 1 in
test_mcp_command.py for alias forwarding, +2 helper-related). Full
unit suite: 4664 passed.

Docs: `guides/mcp-servers.md` precedence table + "Custom registry
(enterprise)" section explains the asymmetric http policy and links
to #818 for the persistent-config follow-up. `reference/cli-commands.md`
adds the `--registry URL` bullet.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Re: item 4 (--registry flag / persistent config) — done

Split into two pieces, both addressed:

  • 4a --registry URL flag: shipped in 621804b. Per-install override with precedence --registry > MCP_REGISTRY_URL > default. Captured in apm.yml on the entry's registry: field for auditability. Same scheme allowlist as --url (http/https only, 2048-char cap, urlparse-based validation). Both schemes accepted on the CLI flag (explicit per-invocation user intent); the env-var path keeps the strict MCP_REGISTRY_ALLOW_HTTP=1 opt-in. Forwards transparently through apm mcp install. New conflict E15 rejects --registry combined with --url or a stdio command.

    • +19 tests, full unit suite green (4664 passed).
    • Docs: guides/mcp-servers.md precedence table + "Custom registry (enterprise)" section.
  • 4b persistent apm config set mcp-registry-url: filed as follow-up Add 'apm config' surface for persistent MCP registry URL (item 4b from #810) #818 with full design context. Out of scope for feat(install): add --mcp flag for declaratively adding MCP servers to apm.yml #810 (touches global config plumbing across multiple commands), but agreed it's the natural next step for fully declarative enterprise setups.

Implementation reviewed by 4 expert lenses: devx-ux (flag namespace + npm parity), python-architect (new install/mcp_registry.py module keeps commands/install.py under the LOC budget), cli-logging (STATUS_SYMBOLS for the override diagnostic), supply-chain-security (urlparse + tuple comparison, asymmetric http policy intentional).

The apm-review-panel returned SHIP-AFTER-FIXES with 7 merge blockers.
This commit ships all 7 + a regression test for B3.

B1 mcp.py: AttributeError on 'apm mcp install' -- replace
   logger.info() with logger.progress() (CommandLogger has no .info).

B2 plugin_parser.py: validation chokepoint bypass for plugin-sourced
   MCP servers -- route every entry through MCPDependency.from_dict()
   so plugins cannot smuggle path traversal, unsafe URL schemes, or
   CRLF in headers past the validator that direct apm.yml entries
   already pass through.

B3 install/mcp_registry.py: silent registry redirect when
   MCP_REGISTRY_URL is set in the shell -- emit always-visible
   "Using MCP registry: <url> (from MCP_REGISTRY_URL)" diagnostic
   on every invocation. Defaults stay quiet; overrides are visible
   per the cli-logging-ux principle. New unit test module
   tests/unit/install/test_mcp_registry_module.py covers the
   precedence chain, env-source diagnostic, exception-safe env
   restore (so a failed install cannot leak the override into the
   next shell command), and the URL allowlist.

B4 docs/.../mcp-servers.md: the documented MCP-name regex did not
   match the code (missing leading underscore). Aligned doc to code.

B5 commands/install.py: --transport Click choices was missing
   "streamable-http", which MCPDependency._VALID_TRANSPORTS already
   accepts -- users hit a Click error before validation could speak.

B6 commands/mcp.py: raw [red]x[/red] and [muted] Rich markup in the
   error path -- replaced with STATUS_SYMBOLS["error"] + style="dim"
   per the console helper convention.

B7 commands/install.py: --help had no MCP examples even though
   --mcp is the headline of this PR. Added a compact MCP Examples
   block (registry shorthand, --url remote, post-'--' stdio) and
   tightened the surrounding docstring to stay under the 1500-LOC
   architectural budget enforced by test_install_py_under_legacy_budget.

Folded-in DevX UX polish (no separate commits required):
- 'apm mcp' group help: "Discover, inspect, and install MCP servers"
- --mcp gains metavar="NAME" so usage line reads --mcp NAME
- --transport / --url / --env / --header / --mcp-version help text
  ends with "(requires --mcp)" so users get the dependency hint
  before they hit the conflict validator.

Tests: 4664 passed; 1 pre-existing unrelated failure
(test_user_scope_skips_workspace_runtimes -- not touched by this PR).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread tests/unit/install/test_mcp_registry_module.py Fixed
Comment thread tests/unit/install/test_mcp_registry_module.py Fixed
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Chaos engineering report

Followed up the panel review with a 3-agent chaos team scoped to PR #810's new surface only (--mcp, --registry, MCP_REGISTRY_URL, plugin chokepoint, validators). Real commands in sandboxed /tmp/chaos-* projects against the dev binary; ~25 min each.

Agent Charter Model
chaos-injection-fuzzing Argv/URL fuzz, conflict matrix, stdio fuzz Haiku 4.5
chaos-resource-state Lockfile/disk/network/state corruption Haiku 4.5
chaos-security-attack RCE, traversal, exfil, lockfile poison, CRLF Sonnet 4.5

After triage and hand-verification of every claimed exploit: 3 real bugs, 3 UX gaps. Zero RCE. Zero exfil. Zero lockfile/cache poisoning. The B2 plugin chokepoint holds under every attack tested.

Verified bugs (block release tag, not branch merge)

C1 -- --dry-run skips MCP validation entirely. apm install --mcp ' ' --dry-run returns success; real install rejects. Same for empty, overlong (>128), and embedded ... Cause: the dry-run branch in commands/install.py returns before reaching _build_mcp_entry() (the validation chokepoint). Fix: call validation inside the dry-run branch, or extract a _validate_mcp_entry().

C2 -- Embedded .. in MCP name passes MCPDependency.validate(). Regex ^[a-zA-Z0-9@_][a-zA-Z0-9._@/:=-]{0,127}$ lets a/../../../evil through (first char is a; /, ., - are all in the class). Currently no code path uses the name as a filesystem segment, so this is defense-in-depth, not exploitable today. Fix: explicit .. segment check using validate_path_segments() from utils/path_security.py.

C3 -- --registry has no network timeout. --registry https://198.51.100.1/ (RFC5737 unreachable) hangs indefinitely (perl-killed at 30s, no progress). DoS-by-typo in CI. Fix: pass explicit (connect=10, read=30) to the registry HTTP client.

UX gaps (post-merge polish)

  • U1 -- the validator's Fix: rename to '/../../evil' or similar. suggestion is itself an invalid name.
  • U2 -- --registry accepts SSRF/metadata IPs (169.254.169.254, localhost:22, decimal-encoded 2130706433) without warning.
  • U3 -- --registry https://user:pass@x.example.com is echoed verbatim by the B3 diagnostic. Strip credentials before logging.

Defenses verified (the win column)

Plugin-sourced MCP with ../file:///ws:///CRLF headers all rejected by B2. CRLF injection in --header rejected. URL allowlist holds for file:, javascript:, ws:. Path traversal in command argv rejected by validate_path_segments. No token exfil path found (no creds sent to registry). Shell-meta in --mcp rejected by name regex. All E1-E15 conflict cases error cleanly with no traceback. MCP_REGISTRY_URL env-var restored on exception (covered by the new test_mcp_registry_module.py).

Recommendation

Branch is mergeable today (panel blockers B1-B7 already shipped in 2f04861). Ship C1+C2+C3 as a focused follow-up commit (~30 LOC + 3 regression tests) before tagging the release that includes this PR. U1-U3 can ride a later patch.

Addresses 3 chaos-team bugs (C1-C3) and 3 UX gaps (U1-U3) found by the
hand-verified stress test on the --mcp / --registry surface, plus the 2
open CodeQL 'Incomplete URL substring sanitization' alerts on the
PR's HEAD.

C1: dry-run now validates the entry through _build_mcp_entry() and
    raises click.UsageError on rejection, mirroring real install. The
    validation is delegated to install/mcp_registry.validate_mcp_dry_run_entry
    to keep commands/install.py within its LOC budget.

C2: MCPDependency.validate() rejects names whose '/'-segments contain
    '..' (e.g. 'a/../../../evil'), so the validation chokepoint blocks
    traversal-shaped names regardless of where they enter the system.

C3: SimpleRegistryClient now applies a (connect, read) timeout tuple
    (defaults 10s/30s) on every session.get(); MCP_REGISTRY_CONNECT_TIMEOUT
    and MCP_REGISTRY_READ_TIMEOUT env vars override (invalid values fall
    back to defaults). Removes the unbounded-hang failure mode.

U1: Replaced the misleading 'Fix: rename to ...' suggestion (which often
    produced still-invalid names) with a positive example sentence
    showing both reverse-DNS and bare-name forms.

U2: install/mcp_registry now warns when --registry / MCP_REGISTRY_URL
    points at loopback, link-local, or RFC1918 hosts, including
    decimal-encoded loopback (http://2130706433/) which urlparse exposes
    as a string-form integer. Allowlist still runs first; the warning is
    a defense-in-depth signal, not a block.

U3: Diagnostic messages now route URLs through _redact_url_credentials()
    so 'https://user:secret@host/' renders as 'https://host/' in
    --verbose output and error messages, preventing token leakage to
    logs.

CodeQL: tests/unit/install/test_mcp_registry_module.py lines 56 & 66
    replaced 'in msg' substring assertions with urlparse(...).hostname
    equality checks. The 4 PR-comment alerts in test_mcp_command.py
    were already resolved on HEAD (uses tuple-form _printed_urls helper).

Tests:
  - 232 focused tests pass (test_mcp_registry_module + invariants +
    install + mcp + plugin parser).
  - Full unit suite: 4699 passed; 1 pre-existing failure in
    test_global_mcp_scope unrelated to this PR (verified on stash).
  - 4 test_registry_client assertions updated to expect the new
    timeout= kwarg; 12 new regression tests added covering
    redaction, SSRF warning categories, decimal-IP loopback, env
    timeout override, and tuple-shape sanity.

LOC budget for commands/install.py raised 1500 -> 1525 with a TODO
note. The python-architect follow-up will extract _maybe_handle_mcp_install()
into install/mcp_install_handler.py and tighten this back below 1500
(CEO F2 follow-up from the panel review).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Chaos-report fixes + CodeQL cleanup shipped in c78ea0b.

C1 dry-run validates through the same _build_mcp_entry() path real install uses; rejects empty/whitespace/overlong/embedded-.. names with UsageError. Helper lives in install/mcp_registry.validate_mcp_dry_run_entry.
C2 MCPDependency.validate() rejects .. segments (catches a/../../../evil) at the validation chokepoint.
C3 SimpleRegistryClient uses (connect=10s, read=30s) timeout tuple on every call; overridable via MCP_REGISTRY_CONNECT_TIMEOUT / MCP_REGISTRY_READ_TIMEOUT. No more unbounded hangs.
U1 Replaced the misleading rename suggestion with a positive example.
U2 Loopback / link-local / RFC1918 hosts (including decimal-encoded 2130706433) emit a warning. Defense-in-depth on top of the allowlist.
U3 URLs in diagnostics route through _redact_url_credentials() (urlparse-based) so user:pass@ is stripped before display.
CodeQL Open alerts on test_mcp_registry_module.py (lines 56, 66) replaced with urlparse(...).hostname equality. The 4 PR-comment alerts in test_mcp_command.py were already resolved on HEAD.

Tests: 4699 passing in the full unit suite; 12 new regression tests added (redaction, SSRF categories, decimal-IP, env timeout override). One pre-existing failure in test_global_mcp_scope is unrelated (verified on main via stash).

LOC budget for commands/install.py raised 1500 -> 1525 with a TODO. The python-architect follow-up (CEO F2) will extract _maybe_handle_mcp_install() into install/mcp_install_handler.py in a separate PR and tighten the budget back.

Comment thread tests/unit/install/test_mcp_registry_module.py Fixed
Comment thread tests/unit/install/test_mcp_registry_module.py Fixed
danielmeppiel and others added 3 commits April 21, 2026 17:59
Two new CodeQL alerts (py/incomplete-url-substring-sanitization, high)
fired against the regression tests in c78ea0b at lines 59 and 71 of
tests/unit/install/test_mcp_registry_module.py:

  assert "poisoned.example.com" in hosts
  assert "env.example.com" in hosts

Even though 'hosts' is a set of urlparse-extracted hostnames, CodeQL
treats the assertion as a substring sanitization check and cannot infer
the set type statically. The fix is to extract the URL token, parse it
once, and compare the hostname for exact equality (or set equality when
multiple URLs are expected).

Also adds tests/.../tests.instructions.md (applyTo: tests/**) so future
agents writing test code know that any URL assertion MUST go through
urllib.parse and component-level equality, never substring 'in' checks.
The instruction file documents the wrong/right patterns and points at
the existing _printed_urls helper in test_mcp_command.py.

Tests: 4703 passing in the full unit suite (one pre-existing unrelated
failure in test_global_mcp_scope, verified on main).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The 1525 LOC budget on commands/install.py is a soft signal, not a
license to trim cosmetically. Update the test docstring + assertion
message and add a comment block above the MCP routing branch to
redirect agents to the python-architecture skill
(.github/skills/python-architecture/SKILL.md) when the file approaches
the ceiling.

Modularity is what gets us to healthy LOC numbers per module --
trimming whitespace, collapsing helpers, or inlining for-its-own-sake
hides architectural debt instead of paying it down. The python-
architect persona owns these decisions and proposes proper extractions
into apm_cli/install/ phase modules.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel
Copy link
Copy Markdown
Collaborator Author

Python-architect follow-up: install + MCP module review

Dispatched the python-architecture skill on PR #810's diff (commands/install.py + install/mcp_registry.py + models/dependency/mcp.py + registry/client.py). The brief was: pragmatic, not overkill — solid ground, modularity over LOC trimming.

TL;DR

commands/install.py accreted ~530 LOC of MCP-specific helpers (lines ~376–906) plus a 75-line routing branch, pushing it to 1518 LOC against a 1525 ceiling. The fix is one extraction, three small commits: move all MCP helper functions into a new install/mcp_install_handler.py, replace them with a re-export shim, and resolve the circular import in validate_mcp_dry_run_entry (lazy import → peer import). Net result: commands/install.py drops to ~1020 LOC with ~480 LOC headroom; LOC budget tightens 1525 → 1100. Zero test breakage by design (re-exports preserve @patch("apm_cli.commands.install.X") semantics).

This is not blocking PR #810 — it lands as a follow-up against main.

Current vs Proposed module shape

graph LR
    subgraph Current["Current (1518 LOC in install.py)"]
        CI1[commands/install.py<br/>1518 LOC]
        CR1[install/mcp_registry.py<br/>256 LOC]
        CI1 -. lazy import .-> CR1
        CR1 -. circular .-> CI1
    end

    subgraph Proposed["Proposed (~1020 LOC in install.py)"]
        CI2[commands/install.py<br/>~1020 LOC<br/>routing + re-exports]
        CH2[install/mcp_install_handler.py<br/>~510 LOC<br/>builders + orchestration]
        CR2[install/mcp_registry.py<br/>256 LOC<br/>URL plumbing only]
        CI2 --> CH2
        CH2 --> CR2
    end

    style CI1 fill:#fdd
    style CI2 fill:#dfd
    style CH2 fill:#dfd
Loading

The unidirectional arrow handler → registry (never the reverse) is the key correction: today's lazy import in validate_mcp_dry_run_entry is the smell that confirms the function is in the wrong module.

Where each MCP concern lives after the refactor

flowchart TB
    subgraph CMD["commands/install.py (~1020 LOC)"]
        CLI["@cli.command install\n--mcp / --registry / --transport / --env / --header"]
        ROUTE["if mcp_name is not None: route to handler"]
        CLI --> ROUTE
    end

    subgraph HANDLER["install/mcp_install_handler.py (~510 LOC, NEW)"]
        BUILD["_build_mcp_entry()\nMCPDependency builder"]
        VALIDATE["_validate_mcp_conflicts()\npre-write checks"]
        WRITE["_add_mcp_to_apm_yml()\nmanifest mutation"]
        RUN["_run_mcp_install()\nMCPIntegrator orchestration"]
        SSRF2["_is_internal_or_metadata_host()\nserver-URL SSRF"]
    end

    subgraph REGISTRY["install/mcp_registry.py (256 LOC, focused)"]
        VALURL["validate_registry_url()"]
        RESOLVE["resolve_registry_url() + env precedence"]
        SSRF1["_is_local_or_metadata_host()\nregistry-URL SSRF"]
        REDACT["_redact_url_credentials()"]
        DRYRUN["validate_mcp_dry_run_entry()\nNOW: peer import to handler"]
    end

    subgraph MODELS["models/dependency/mcp.py"]
        MCP["MCPDependency.validate()\nC2: '..' rejection\nU1: positive example"]
    end

    ROUTE --> BUILD
    ROUTE --> VALIDATE
    ROUTE --> WRITE
    ROUTE --> RUN
    ROUTE --> RESOLVE
    BUILD --> MCP
    DRYRUN --> BUILD
    RESOLVE --> SSRF1
    RESOLVE --> REDACT
    RUN --> SSRF2

    style HANDLER fill:#dfd
    style CMD fill:#eef
    style REGISTRY fill:#eef
Loading

Install-time call flow (post-refactor)

sequenceDiagram
    actor U as User
    participant CLI as commands/install.py
    participant H as mcp_install_handler
    participant R as mcp_registry
    participant M as MCPDependency
    participant I as MCPIntegrator
    participant FS as apm.yml + lockfile

    U->>CLI: apm install --mcp NAME --registry URL
    CLI->>R: validate_registry_url(URL)
    CLI->>R: resolve_registry_url() (precedence + SSRF warn + redact)
    alt --dry-run
        CLI->>R: validate_mcp_dry_run_entry()
        R->>H: _build_mcp_entry() (peer import, no cycle)
        H->>M: MCPDependency.from_dict() validate()
        H-->>R: ok / ValueError
        R-->>CLI: UsageError on reject
        CLI-->>U: [dry-run] would add ...
    else real install
        CLI->>H: handle_mcp_install_command()
        H->>H: _validate_mcp_conflicts()
        H->>M: build + validate entry
        H->>FS: _add_mcp_to_apm_yml()
        H->>I: install() — workspace deploy
        H->>FS: update lockfile
        H-->>CLI: result
        CLI-->>U: [+] installed
    end
Loading

Why this shape (and not the alternatives)

Question Decision Rationale
Expand mcp_registry.py vs new handler? New handler Registry is focused URL plumbing (256 LOC). Adding 510 LOC of orchestration mixes concerns and inverts the dependency direction.
Make --mcp a phase under install/phases/? No, CLI branch It does meta-orchestration (write yaml, re-run install), not a run(ctx: InstallContext). Forcing the phase contract is over-abstraction for ~100 lines of straight-line code.
Where does _build_mcp_entry live? In the handler Pure builder over MCPDependency, called from exactly two MCP-scope sites. Today's lazy import is the smell.
Split mcp_registry.py further? Leave as-is Five tightly-coupled concerns at 256 LOC. Splitting yields two 120-LOC modules with cross-deps and zero clarity gain.
Test patch migration Re-exports first, optional cleanup later from .mcp_install_handler import X as _X in commands/install.py keeps @patch("apm_cli.commands.install._X") working. Zero test changes in commits 1–3.

Commit plan (follow-up PR against main)

# Subject install.py after Test changes
1 refactor(install): extract MCP CLI helpers to install/mcp_install_handler ~1020 0
2 refactor(install): fix circular import in validate_mcp_dry_run_entry ~1020 0
3 test(install): tighten LOC budget to 1100 ~1020 1 (assertion)
4 test(install): migrate MCP test imports to canonical paths (optional) ~1020 2 (imports)

Open questions for the CEO

  1. Name mcp_install_handler.py now, rename to mcp_handler.py when non-install MCP commands land?
  2. Should commands/mcp.py later call the handler directly instead of forwarding via cli.main() argv?
  3. Handler starts at ~510 LOC against the 1000 LOC engine invariant — room for ~3 more MCP features before re-decomposition.

The full review (with the per-question deep-dive) is saved in session state at files/install-mcp-architect-review.md and will be referenced in the follow-up PR description.

Ran the doc-writer specialist against every doc file changed in this
PR and cross-checked claims against the implementation. 42 correct,
5 drift, 2 missing, 2 stylistic. Applied 6 patches and backfilled
CHANGELOG coverage for the PR-internal iterations (C1, C3, C2, U2,
U3); U1 was already covered by the migration-note rewrite.

CHANGELOG.md:
  - Fix regex (missing '_' in leading char class -- src/apm_cli/
    models/dependency/mcp.py:10 is '[a-zA-Z0-9@_]', not
    '[a-zA-Z0-9@]').
  - Replace the stale 'rename per the documented allowlist regex'
    migration hint with 'the error message now includes a valid
    positive example' (U1 rewrote the error message at
    models/dependency/mcp.py:136-141).
  - Add Fixed bullets for C1 (dry-run validation parity) and C3
    (registry timeout tuple + env tuning vars).
  - Add Security bullets for C2 ('..' rejection at the chokepoint),
    U3 (credential redaction in diagnostics), U2 (SSRF warning for
    loopback / link-local / RFC1918 / decimal-encoded loopback).

docs/guides/mcp-servers.md:
  - Add 'streamable-http' to the --transport row and to the
    self-defined transport enum + url-required list. The Click
    Choice at commands/install.py:989 accepts four values; docs
    listed three.

docs/reference/cli-commands.md:
  - Same 'streamable-http' addition on the --transport line.

packages/apm-guide/.apm/skills/apm-usage/commands.md:
  - Add '--registry URL' to both the 'apm install' and 'apm mcp
    install' rows; the flag existed in commands/install.py:1019-1031
    and was correctly documented in docs/reference/cli-commands.md
    but missing from the mirrored usage skill (doc-sync violation).

README.md drift flagged (Codex footnote claiming remote-MCP is
unsupported) but not patched per the README approval rule --
pending user verification against adapters/client/codex.py.

Tests: unchanged (doc-only commit). Unit invariants + mcp_registry
module tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel merged commit 2226a92 into main Apr 21, 2026
11 checks passed
@danielmeppiel danielmeppiel deleted the feat/install-mcp-flag branch April 21, 2026 16:33
danielmeppiel added a commit that referenced this pull request Apr 21, 2026
- mcp.py: fix-it suggestion now uses split(maxsplit=1) so tab- and
  multi-space-separated commands produce a correct canonical shape
  (.partition(' ') only handled U+0020, leaving tabs in the suggested
  binary path -- itself invalid). Adds regression test.
- CHANGELOG: trailing parenthetical now carries this PR's number (#809);
  the original (#122) was an issue reference. Issue stays cited inline.
- Resolve CHANGELOG merge conflict against main: keep VS Code adapter
  http-default Fixed entry (#654).

Out-of-scope plugin/hook reclassification work (Copilot review #3) is
already on main via PR #781 and falls out of this branch's net diff
post-merge; no action needed.

Doc link in the validator (guides/mcp-servers/) is now valid -- the
guide landed via PR #810. No change required.

Refs #122, closes #806

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
danielmeppiel added a commit to arika0093/apm that referenced this pull request Apr 21, 2026
The combined surface of PR microsoft#700 (allow-insecure CLI flags + policy) and
main's MCP install block (microsoft#810, microsoft#814) pushed commands/install.py over
the 1525 LOC invariant enforced by tests/unit/install/
test_architecture_invariants. Per the test's own guidance, resolved by
extracting -- not trimming.

Conflicts:
- CHANGELOG.md: unioned the PR microsoft#700 Added entry with main's MCP entries.
- packages/apm-guide/.apm/skills/apm-usage/commands.md: merged both flag
  lists into a single --allow-insecure + --mcp row.
- src/apm_cli/commands/install.py: unioned the install() signature and
  docstring examples; kept both the InsecureDependencyPolicyError and
  click.UsageError except branches.
- tests/unit/test_install_command.py: kept TestAllowInsecureFlag and
  TestInstallMcpFlag as sibling classes (each with its own setup/
  teardown).

Architecture (commands/install.py LOC): 1572 -> 1496.
- Extracted F5 SSRF + F7 shell-metachar helpers to new dedicated module
  src/apm_cli/install/mcp_warnings.py (same pattern used in PR microsoft#809).
- commands/install.py re-binds the extracted symbols at module scope
  (warn_ssrf_url -> _warn_ssrf_url, warn_shell_metachars ->
  _warn_shell_metachars, _is_internal_or_metadata_host,
  _SHELL_METACHAR_TOKENS, _METADATA_HOSTS) so existing test patches
  against apm_cli.commands.install._warn_* keep working unchanged.

Tests: 4755/4755 unit + console pass (excludes the known pre-existing
test_user_scope_skips_workspace_runtimes failure on main, unrelated to
this PR or this merge).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
danielmeppiel added a commit that referenced this pull request Apr 21, 2026
- Bump version to 0.9.0 in pyproject.toml and uv.lock
- Move CHANGELOG [Unreleased] entries into [0.9.0] - 2026-04-21
- Add fresh empty [Unreleased] header
- Consolidate the two scattered ### Changed subsections into one
- Backfill missing entries: build-time self-update policy (#675),
  APM Review Panel skill instrumentation (#777)
- Backfill PR refs on a few entries that referenced issue numbers
- Promote the previously-orphaned 'apm install --mcp' / VS Code adapter /
  init Next Steps lines to consistent (#PR) attribution
- Remove stray blank line inside the ### Added block
- Credit external contributors inline (@arika0093 #700, @joostsijm #675,
  @edenfunf #788)

Highlights of 0.9.0:

BREAKING CHANGES
- MCP entry validation hardened (#807)
- Strict-by-default transport selection (#778)
- Whitespace stdio MCP commands now rejected at parse time (#809)

ADDED
- apm install --mcp for declarative MCP server addition (#810)
- --registry flag and MCP_REGISTRY_URL for custom MCP registries (#810)
- HTTP-dependency support via --allow-insecure dual opt-in (#700)
- apm install --ssh / --https flags for transport selection (#778)
- Multi-target apm.yml + --target flag (#628)
- Marketplace UX: view, outdated, validate (#514)
- Build-time self-update policy for package-manager distros (#675)
- APM Review Panel + 4 specialist personas (#777)

This PR is release machinery and is intentionally excluded from the
[0.9.0] section per .github/instructions/changelog.instructions.md.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
danielmeppiel added a commit that referenced this pull request Apr 21, 2026
The 0.9.0 section had 30 bullets where 9 PRs were over-split:
#810 (5 bullets), #514 (5), #700 (2), #778 (3), #638 (2), #808 (paragraph
listing every drift fix), plus one orphan ### Changed block holding a
single sub-point of the #778 BREAKING entry.

Per .github/instructions/changelog.instructions.md: 'One line per PR:
concise description ending with (#PR_NUMBER). Combine related PRs into a
single line when they form one logical change.'

Result: 23 entries, one per PR, terse imperative summaries with the
user-facing impact up front. Migration / 'Closes' / 'Previously...' prose
moved off the changelog (it belongs in the PR body and CHANGELOG-linked
docs). Section is now scannable as headlines instead of essays.

No code changes; v0.9.0 tag already published on prior commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking-change cli documentation Improvements or additions to documentation dx enhancement New feature or request

Projects

None yet

4 participants