Problem
MCPG backend failures are completely silent in the pipeline. When a stdio MCP backend fails to start (e.g., npx timeout due to blocked network), the pipeline reports success but the agent simply can't see its MCP tools. The only evidence is buried in MCPG stderr logs which (until recently) weren't even published as artifacts.
What happened
- MCPG starts successfully —
/health returns 200
- MCPG registers routes for all configured backends (azure-devops, safeoutputs)
- Agent runs, tries to call an azure-devops tool
- MCPG lazily launches
docker run ... node:20-slim npx -y @azure-devops/mcp ...
npx can't reach npm registry (AWF iptables blocks the container)
- MCPG times out after 30s, returns error to agent
- Agent silently falls back to working without the tool
- Pipeline step "succeeds" — no one knows the MCP was broken
Root cause
MCPG backends are lazily launched on first tool call. The /health endpoint at startup shows all servers as "stopped" (not yet started), which is normal. The actual failure only occurs when the agent calls a tool, and it's not surfaced in the pipeline log.
Proposed solutions
1. Post-startup backend warm-up step (ado-aw — recommended)
After MCPG health check passes, add a pipeline step that eagerly probes each configured MCP backend via MCPG to force lazy initialization:
# Probe each backend to force lazy launch and detect failures early
for server in $(jq -r '.mcpServers | keys[]' "$GATEWAY_OUTPUT"); do
echo "Probing MCP backend: $server"
RESPONSE=$(curl -sf -X POST \
-H "Authorization: Bearer $MCP_GATEWAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' \
"http://localhost:80/mcp/$server" 2>&1) || {
echo "##vso[task.logissue type=warning]MCP backend '$server' failed to initialize"
echo "Response: $RESPONSE"
}
done
This catches failures before the agent runs, making them visible in pipeline logs with an ADO warning annotation.
2. Post-agent MCPG health check (ado-aw)
After the agent completes, query MCPG /health and check per-server status. MCPG's health endpoint reports per-server state including "error" with lastError messages:
HEALTH=$(curl -sf "http://localhost:80/health")
echo "$HEALTH" | jq .
UNHEALTHY=$(echo "$HEALTH" | jq -r '.servers | to_entries[] | select(.value.status == "error") | .key')
if [ -n "$UNHEALTHY" ]; then
echo "##vso[task.logissue type=warning]MCPG backends failed: $UNHEALTHY"
fi
3. MCPG eager launch mode (upstream — gh-aw-mcpg)
Request an upstream feature: a config flag (e.g., gateway.eagerLaunch: true) that makes MCPG launch all stdio backends at startup time instead of lazily. This would surface failures immediately in the health check, before the agent runs. Currently MCPG's Launcher.GetOrLaunch() is purely lazy — backends are only started when the first tool call arrives.
4. Compile-time ecosystem validation (ado-aw)
At compile time, warn when a containerized MCP uses npx but the corresponding ecosystem (node) isn't in the network allowlist. This is a static analysis check that catches the misconfiguration before it reaches the pipeline:
Warning: MCP server 'azure-devops' uses container 'node:20-slim' with entrypoint 'npx',
but 'node' ecosystem is not in network.allowed. npx requires npm registry access.
Consider adding 'node' to network.allowed or using a pre-built container image.
Recommendation
Implement #1 (warm-up probe) as the primary fix — it's the most reliable and doesn't require upstream changes. #4 (compile-time check) is a nice defense-in-depth addition. #3 would be ideal long-term but requires coordination with gh-aw-mcpg.
Context
Problem
MCPG backend failures are completely silent in the pipeline. When a stdio MCP backend fails to start (e.g.,
npxtimeout due to blocked network), the pipeline reports success but the agent simply can't see its MCP tools. The only evidence is buried in MCPG stderr logs which (until recently) weren't even published as artifacts.What happened
/healthreturns 200docker run ... node:20-slim npx -y @azure-devops/mcp ...npxcan't reach npm registry (AWF iptables blocks the container)Root cause
MCPG backends are lazily launched on first tool call. The
/healthendpoint at startup shows all servers as "stopped" (not yet started), which is normal. The actual failure only occurs when the agent calls a tool, and it's not surfaced in the pipeline log.Proposed solutions
1. Post-startup backend warm-up step (ado-aw — recommended)
After MCPG health check passes, add a pipeline step that eagerly probes each configured MCP backend via MCPG to force lazy initialization:
This catches failures before the agent runs, making them visible in pipeline logs with an ADO warning annotation.
2. Post-agent MCPG health check (ado-aw)
After the agent completes, query MCPG
/healthand check per-server status. MCPG's health endpoint reports per-server state including"error"withlastErrormessages:3. MCPG eager launch mode (upstream — gh-aw-mcpg)
Request an upstream feature: a config flag (e.g.,
gateway.eagerLaunch: true) that makes MCPG launch all stdio backends at startup time instead of lazily. This would surface failures immediately in the health check, before the agent runs. Currently MCPG'sLauncher.GetOrLaunch()is purely lazy — backends are only started when the first tool call arrives.4. Compile-time ecosystem validation (ado-aw)
At compile time, warn when a containerized MCP uses
npxbut the corresponding ecosystem (node) isn't in the network allowlist. This is a static analysis check that catches the misconfiguration before it reaches the pipeline:Recommendation
Implement #1 (warm-up probe) as the primary fix — it's the most reliable and doesn't require upstream changes. #4 (compile-time check) is a nice defense-in-depth addition. #3 would be ideal long-term but requires coordination with gh-aw-mcpg.
Context
nodeecosystem to ADO MCP extension'srequired_hosts()tee(also in PR feat: generate mcp-config.json from MCPG runtime output instead of compile time #252)/health) already supports per-server status:running,stopped,errorwithlastErrorfield