github · pelikhan · Apr 18, 2026 · Apr 18, 2026 · Apr 18, 2026 · Apr 18, 2026
diff --git a/docs/adr/26968-mcp-compile-tool-stdout-only-success-response.md b/docs/adr/26968-mcp-compile-tool-stdout-only-success-response.md
@@ -0,0 +1,79 @@
+# ADR-26968: MCP Compile Tool Uses Subprocess Stdout Exclusively for Success Responses
+
+**Date**: 2026-04-18
+**Status**: Draft
+**Deciders**: pelikhan, Copilot
+
+---
+
+## Part 1 — Narrative (Human-Friendly)
+
+### Context
+
+The `gh-aw` binary exposes a Model Context Protocol (MCP) server over stdio, where the JSON-RPC transport channel (stdout) must carry only well-formed JSON-RPC 2.0 messages. The `compile` tool invokes an external subprocess (the `gh-aw` binary itself in compile mode) to validate workflow files and return structured JSON results. Prior to this decision, there was ambiguity about whether subprocess stderr output — used for diagnostics and progress logging — should also be included in the MCP tool response text returned over the JSON-RPC channel. Mixing stderr into the response text caused JSON parsing failures when MCP clients attempted to parse the response as JSON, because diagnostic log lines are plain text, not JSON.
+
+### Decision
+
+We will configure the `compile` MCP tool to source its success response content exclusively from subprocess stdout, discarding stderr output from the tool response body. When the subprocess exits successfully, only the data written to its stdout (valid JSON) will be returned as the tool result text. Subprocess stderr is reserved for diagnostics visible at the process level (e.g., captured by a test harness or container logs) and must not appear in the JSON-RPC response. This invariant is enforced by both a unit test (mocking the subprocess command) and a binary-level integration test (launching the full MCP server and verifying raw stdio traffic).
+
+### Alternatives Considered
+
+#### Alternative 1: Include Both Stdout and Stderr in the Tool Response
+
+Concatenating stdout and stderr into the tool response was considered because it preserves all diagnostic information visible to an MCP client. This was rejected because MCP clients that expect the compile result text to be parseable JSON would encounter parse failures whenever the subprocess emitted any log line to stderr. The MCP protocol treats tool result text as opaque to the transport but clients — including the test harness — depend on the structured JSON content.
+
+#### Alternative 2: Route All Output Through Structured JSON on Stdout
+
+An alternative is to require the subprocess to emit all output (including diagnostics) as structured JSON on stdout, eliminating stderr entirely. This would allow richer error context in the tool response. It was not chosen because it would require invasive changes to the `compile` subcommand's logging infrastructure across all callers, not just the MCP server. The subprocess is also invoked directly by users and other tooling where plain-text stderr diagnostics are desirable.
+
+#### Alternative 3: Capture Stderr and Append It as a Separate Content Item
+
+The MCP tool result `content` array can hold multiple items. Stderr text could be appended as a second `TextContent` item tagged as diagnostic. This was considered but rejected because it complicates client logic (clients must filter by content item index or type to obtain the JSON result), and it leaks internal diagnostic format changes into the public tool response contract.
+
+### Consequences
+
+#### Positive
+- MCP clients receive a clean, parseable JSON response; no interleaved log noise can break JSON parsing.
+- The stdout/stderr split is a well-understood Unix convention; future subprocess implementers can follow it without reading MCP-specific documentation.
+- Binary-level integration tests provide strong regression protection at the transport boundary.
+
+#### Negative
+- Subprocess diagnostic output is invisible to MCP clients; clients that need to surface compiler warnings embedded only in stderr must rely on the subprocess writing those to stdout instead.
+- The unit test uses shell scripts (`sh -c`) to mock subprocess output; this creates a dependency on a POSIX shell being present in the test environment.
+
+#### Neutral
+- The integration test must be run against a pre-built binary (`make build`); it will skip automatically when the binary is absent. Teams must remember to build before running integration tests.
+- Stderr captured during integration tests is collected into a buffer but not asserted on, leaving stderr content unverified at the integration level.
+
+---
+
+## Part 2 — Normative Specification (RFC 2119)
+
+> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).
+
+### MCP Tool Response Content
+
+1. On subprocess success (exit code 0), the `compile` MCP tool **MUST** populate its `TextContent` response exclusively from the subprocess's stdout stream.
+2. The `compile` MCP tool **MUST NOT** include any bytes read from the subprocess's stderr stream in the tool result `content` field.
+3. The tool response text **MUST** be valid JSON as produced by the subprocess; implementations **MUST NOT** append additional text before or after the subprocess stdout output.
+4. Implementations **MAY** log subprocess stderr to a diagnostic sink (e.g., a container log or test buffer) that is separate from the JSON-RPC stdio channel.
+
+### MCP Server Stdio Channel Purity
+
+1. All output written to the MCP server process's stdout **MUST** be valid JSON-RPC 2.0 messages.
+2. The MCP server process **MUST NOT** write any non-JSON-RPC content (log lines, progress messages, debug output) to its stdout.
+3. Diagnostic output from the MCP server process itself **SHOULD** be directed to stderr or a dedicated log file, never to stdout.
+
+### Test Coverage
+
+1. Any change to `compile` tool output stream handling **MUST** be accompanied by a unit test that verifies stdout-only semantics using a mocked subprocess command.
+2. Any change to MCP server stdio stream handling **SHOULD** be covered by a binary-level integration test that validates raw stdout traffic is valid JSON-RPC 2.0 throughout the full initialize–call–shutdown lifecycle.
+3. Integration tests that depend on a compiled binary **MAY** skip (using `t.Skip`) when the binary is not present, but **MUST NOT** silently pass.
+
+### Conformance
+
+An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Specifically: the `compile` tool result text contains only subprocess stdout bytes on success, the MCP server stdout channel carries only valid JSON-RPC 2.0 messages, and both a unit test and an integration test exist to enforce these invariants. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.
+
+---
+
+*This is a DRAFT ADR generated by the [Design Decision Gate](https://github.com/github/gh-aw/actions/runs/24594338493) workflow. The PR author must review, complete, and finalize this document before the PR can merge.*
diff --git a/pkg/cli/mcp_server_stdio_integration_test.go b/pkg/cli/mcp_server_stdio_integration_test.go
@@ -3,9 +3,13 @@
 package cli
 
 import (
+	"bufio"
 	"bytes"
 	"context"
+	"encoding/json"
 	"errors"
+	"fmt"
+	"io"
 	"os"
 	"os/exec"
 	"path/filepath"
@@ -84,3 +88,175 @@ engine: copilot
 	}
 
 }
+
+func TestMCPServer_CompileAllWorkflows_StdoutOnlyJSONRPC(t *testing.T) {
+	binaryPath := "../../gh-aw"
+	if _, err := os.Stat(binaryPath); os.IsNotExist(err) {
+		t.Skip("Skipping test: gh-aw binary not found. Run 'make build' first.")
+	}
+
+	tmpDir := testutil.TempDir(t, "mcp-stdio-rpc-*")
+	workflowsDir := filepath.Join(tmpDir, ".github", "workflows")
+	if err := os.MkdirAll(workflowsDir, 0755); err != nil {
+		t.Fatalf("Failed to create workflows directory: %v", err)
+	}
+
+	workflowPath := filepath.Join(workflowsDir, "test.md")
+	workflowContent := `---
+on: push
+engine: copilot
+---
+# Test Workflow
+`
+	if err := os.WriteFile(workflowPath, []byte(workflowContent), 0644); err != nil {
+		t.Fatalf("Failed to write workflow file: %v", err)
+	}
+
+	if err := initTestGitRepo(tmpDir); err != nil {
+		t.Fatalf("Failed to initialize git repository: %v", err)
+	}
+
+	originalDir, err := os.Getwd()
+	if err != nil {
+		t.Fatalf("Failed to get current working directory: %v", err)
+	}
+
+	absBinaryPath := filepath.Join(originalDir, binaryPath)
+	ctx, cancel := context.WithTimeout(context.Background(), 20*time.Second)
+
+	cmd := exec.CommandContext(ctx, absBinaryPath, "mcp-server", "--cmd", absBinaryPath)
+	cmd.Dir = tmpDir
+	cmd.Env = os.Environ()
+
+	stdin, err := cmd.StdinPipe()
+	if err != nil {
+		t.Fatalf("Failed to open stdin pipe: %v", err)
+	}
+	stdout, err := cmd.StdoutPipe()
+	if err != nil {
+		t.Fatalf("Failed to open stdout pipe: %v", err)
+	}
+	var stderr bytes.Buffer
+	cmd.Stderr = &stderr
+
+	if err := cmd.Start(); err != nil {
+		t.Fatalf("Failed to start MCP server: %v", err)
+	}
+	defer func() {
+		cancel()
+		_ = stdin.Close()
+		_ = cmd.Wait()
+	}()
+
+	reader := bufio.NewScanner(stdout)
+	reader.Buffer(make([]byte, 1024), 1024*1024)
+
+	sendJSONRPCMessage(t, stdin, map[string]any{
+		"jsonrpc": "2.0",
+		"id":      1,
+		"method":  "initialize",
+		"params": map[string]any{
+			"protocolVersion": "2025-03-26",
+			"capabilities":    map[string]any{},
+			"clientInfo": map[string]any{
+				"name":    "stdio-test-client",
+				"version": "1.0.0",
+			},
+		},
+	})
+
+	_, _ = waitForJSONRPCResponse(t, reader, 1)
+
+	sendJSONRPCMessage(t, stdin, map[string]any{
+		"jsonrpc": "2.0",
+		"method":  "notifications/initialized",
+		"params":  map[string]any{},
+	})
+
+	sendJSONRPCMessage(t, stdin, map[string]any{
+		"jsonrpc": "2.0",
+		"id":      2,
+		"method":  "tools/call",
+		"params": map[string]any{
+			"name":      "compile",
+			"arguments": map[string]any{},
+		},
+	})
+
+	response, raw := waitForJSONRPCResponse(t, reader, 2)
+	if !strings.Contains(raw, `"jsonrpc":"2.0"`) {
+		t.Fatalf("Expected JSON-RPC response on stdout, got: %s", raw)
+	}
+
+	resultData, ok := response["result"].(map[string]any)
+	if !ok {
+		t.Fatalf("Expected result object in compile response, got: %#v", response["result"])
+	}
+	contentList, ok := resultData["content"].([]any)
+	if !ok || len(contentList) == 0 {
+		t.Fatalf("Expected non-empty content list in compile result, got: %#v", resultData["content"])
+	}
+	firstContent, ok := contentList[0].(map[string]any)
+	if !ok {
+		t.Fatalf("Expected first content item to be object, got: %#v", contentList[0])
+	}
+	text, _ := firstContent["text"].(string)
+	if text == "" {
+		t.Fatal("Expected compile result text content to be non-empty")
+	}
+
+	var compileResult []map[string]any
+	if err := json.Unmarshal([]byte(text), &compileResult); err != nil {
+		t.Fatalf("Expected compile response text to be pure JSON without log pollution, got parse error: %v; text=%q", err, text)
+	}
+
+	sendJSONRPCMessage(t, stdin, map[string]any{
+		"jsonrpc": "2.0",
+		"id":      3,
+		"method":  "shutdown",
+		"params":  map[string]any{},
+	})
+	_, _ = waitForJSONRPCResponse(t, reader, 3)
+}
+
+func sendJSONRPCMessage(t *testing.T, w io.Writer, msg map[string]any) {
+	t.Helper()
+	payload, err := json.Marshal(msg)
+	if err != nil {
+		t.Fatalf("Failed to marshal JSON-RPC message: %v", err)
+	}
+	if _, err := fmt.Fprintf(w, "%s\n", payload); err != nil {
+		t.Fatalf("Failed to write JSON-RPC message: %v", err)
+	}
+}
+
+// waitForJSONRPCResponse reads JSON-RPC lines from stdout until it finds the
+// response with the expected ID. It returns the decoded response object and the
+// raw JSON line as read from stdout.
+func waitForJSONRPCResponse(t *testing.T, scanner *bufio.Scanner, expectedID int) (response map[string]any, rawLine string) {
+	t.Helper()
+
+	for scanner.Scan() {
+		line := scanner.Text()
+		var msg map[string]any
+		if err := json.Unmarshal([]byte(line), &msg); err != nil {
+			t.Fatalf("Non-JSON output detected on MCP stdout (stdout must be JSON-RPC only): %q (err=%v)", line, err)
+		}
+
+		if version, ok := msg["jsonrpc"].(string); !ok || version != "2.0" {
+			t.Fatalf("Non-JSON-RPC output detected on MCP stdout: %q", line)
+		}
+
+		if rawID, ok := msg["id"]; ok {
+			if messageID, ok := rawID.(float64); ok && int(messageID) == expectedID {
+				return msg, line
+			}
+		}
+	}
+
+	if err := scanner.Err(); err != nil {
+		t.Fatalf("Failed while reading MCP stdout: %v", err)
+	}
+	t.Fatalf("Did not receive JSON-RPC response for id=%d", expectedID)
+	return nil, ""
+}
diff --git a/pkg/cli/mcp_tools_output_streams_test.go b/pkg/cli/mcp_tools_output_streams_test.go
@@ -0,0 +1,51 @@
+//go:build !integration
+
+package cli
+
+import (
+	"context"
+	"os/exec"
+	"testing"
+
+	"github.com/modelcontextprotocol/go-sdk/mcp"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func mockCommandWithOutput(stdoutText, stderrText string) execCmdFunc {
+	return func(ctx context.Context, args ...string) *exec.Cmd {
+		return exec.CommandContext(ctx, "sh", "-c", `printf '%s' "$1"; printf '%s' "$2" 1>&2`, "sh", stdoutText, stderrText)
+	}
+}
+
+func extractTextResult(t *testing.T, result *mcp.CallToolResult) string {
+	t.Helper()
+	require.NotNil(t, result, "Tool result should not be nil")
+	require.NotEmpty(t, result.Content, "Tool result should contain content")
+
+	textContent, ok := result.Content[0].(*mcp.TextContent)
+	require.True(t, ok, "Tool result content should be text")
+	return textContent.Text
+}
+
+func TestCompileTool_UsesOnlyStdoutOnSuccess(t *testing.T) {
+	const (
+		expectedStdout = `[{"workflow":"test.md","valid":true,"errors":[],"warnings":[]}]`
+		stderrNoise    = "diagnostic noise should not be returned"
+	)
+
+	server := mcp.NewServer(&mcp.Implementation{Name: "test", Version: "1.0"}, nil)
+	err := registerCompileTool(server, mockCommandWithOutput(expectedStdout, stderrNoise), "")
+	require.NoError(t, err, "registerCompileTool should succeed")
+
+	session := connectInMemory(t, server)
+	result, err := session.CallTool(context.Background(), &mcp.CallToolParams{
+		Name:      "compile",
+		Arguments: map[string]any{},
+	})
+	require.NoError(t, err, "compile tool call should succeed")
+
+	output := extractTextResult(t, result)
+	assert.JSONEq(t, expectedStdout, output, "compile tool should return subprocess stdout only")
+	assert.NotContains(t, output, stderrNoise, "compile tool output should not contain stderr noise")
+}